You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ct...@apache.org on 2017/05/12 14:35:16 UTC
[06/37] lucene-solr:branch_6_6: squash merge jira/solr-10290 into master

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/the-standard-query-parser.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/the-standard-query-parser.adoc b/solr/solr-ref-guide/src/the-standard-query-parser.adoc
new file mode 100644
index 0000000..99bd234
--- /dev/null
+++ b/solr/solr-ref-guide/src/the-standard-query-parser.adoc
@@ -0,0 +1,402 @@
+= The Standard Query Parser
+:page-shortname: the-standard-query-parser
+:page-permalink: the-standard-query-parser.html
+
+Solr's default Query Parser is also known as the "```lucene```" parser.
+
+The key advantage of the standard query parser is that it supports a robust and fairly intuitive syntax allowing you to create a variety of structured queries. The largest disadvantage is that it's very intolerant of syntax errors, as compared with something like the <<the-dismax-query-parser.adoc#the-dismax-query-parser,DisMax>> query parser which is designed to throw as few errors as possible.
+
+[[TheStandardQueryParser-StandardQueryParserParameters]]
+== Standard Query Parser Parameters
+
+In addition to the <<common-query-parameters.adoc#common-query-parameters,Common Query Parameters>>, <<faceting.adoc#faceting,Faceting Parameters>>, <<highlighting.adoc#highlighting,Highlighting Parameters>>, and <<morelikethis.adoc#morelikethis,MoreLikeThis Parameters>>, the standard query parser supports the parameters described in the table below.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|q |Defines a query using standard query syntax. This parameter is mandatory.
+|q.op |Specifies the default operator for query expressions, overriding the default operator specified in the Schema. Possible values are "AND" or "OR".
+|df |Specifies a default field, overriding the definition of a default field in the Schema.
+|sow |Split on whitespace: if set to `false`, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to `true`: text analysis is invoked separately for each individual whitespace-separated term.
+|===
+
+Default parameter values are specified in `solrconfig.xml`, or overridden by query-time values in the request.
+
+
+[[TheStandardQueryParser-TheStandardQueryParser_sResponse]]
+== The Standard Query Parser's Response
+
+By default, the response from the standard query parser contains one `<result>` block, which is unnamed. If the <<common-query-parameters.adoc#CommonQueryParameters-ThedebugParameter,`debug` parameter>> is used, then an additional `<lst>` block will be returned, using the name "debug". This will contain useful debugging info, including the original query string, the parsed query string, and explain info for each document in the <result> block. If the <<common-query-parameters.adoc#CommonQueryParameters-TheexplainOtherParameter,`explainOther` parameter>> is also used, then additional explain info will be provided for all the documents matching that query.
+
+[[TheStandardQueryParser-SampleResponses]]
+=== Sample Responses
+
+This section presents examples of responses from the standard query parser.
+
+The URL below submits a simple query and requests the XML Response Writer to use indentation to make the XML response more readable.
+
+`\http://localhost:8983/solr/techproducts/select?q=id:SP2514N`
+
+Results:
+
+[source,xml]
+----
+<response>
+<responseHeader><status>0</status><QTime>1</QTime></responseHeader>
+<result numFound="1" start="0">
+ <doc>
+  <arr name="cat"><str>electronics</str><str>hard drive</str></arr>
+  <arr name="features"><str>7200RPM, 8MB cache, IDE Ultra ATA-133</str>
+    <str>NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</str></arr>
+  <str name="id">SP2514N</str>
+  <bool name="inStock">true</bool>
+  <str name="manu">Samsung Electronics Co. Ltd.</str>
+  <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str>
+  <int name="popularity">6</int>
+  <float name="price">92.0</float>
+  <str name="sku">SP2514N</str>
+ </doc>
+</result>
+</response>
+----
+
+Here's an example of a query with a limited field list.
+
+`\http://localhost:8983/solr/techproducts/select?q=id:SP2514N&fl=id+name`
+
+Results:
+
+[source,xml]
+----
+<response>
+<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
+<result numFound="1" start="0">
+ <doc>
+  <str name="id">SP2514N</str>
+  <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str>
+ </doc>
+</result>
+</response>
+----
+
+[[TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser]]
+== Specifying Terms for the Standard Query Parser
+
+A query to the standard query parser is broken up into terms and operators. There are two types of terms: single terms and phrases.
+
+* A single term is a single word such as "test" or "hello"
+* A phrase is a group of words surrounded by double quotes such as "hello dolly"
+
+Multiple terms can be combined together with Boolean operators to form more complex queries (as described below).
+
+[IMPORTANT]
+====
+
+It is important that the analyzer used for queries parses terms and phrases in a way that is consistent with the way the analyzer used for indexing parses terms and phrases; otherwise, searches may produce unexpected results.
+
+====
+
+[[TheStandardQueryParser-TermModifiers]]
+=== Term Modifiers
+
+Solr supports a variety of term modifiers that add flexibility or precision, as needed, to searches. These modifiers include wildcard characters, characters for making a search "fuzzy" or more general, and so on. The sections below describe these modifiers in detail.
+
+[[TheStandardQueryParser-WildcardSearches]]
+=== Wildcard Searches
+
+Solr's standard query parser supports single and multiple character wildcard searches within single terms. Wildcard characters can be applied to single terms, but not to search phrases.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,20,50",options="header"]
+|===
+|Wildcard Search Type |Special Character |Example
+|Single character (matches a single character) |? |The search string `te?t` would match both test and text.
+|Multiple characters (matches zero or more sequential characters) |* |The wildcard search: `tes*` would match test, testing, and tester. You can also use wildcard characters in the middle of a term. For example: `te*t` would match test and text. `*est` would match pest and test.
+|===
+
+[[TheStandardQueryParser-FuzzySearches]]
+=== Fuzzy Searches
+
+Solr's standard query parser supports fuzzy searches based on the Damerau-Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term. For example, to search for a term similar in spelling to "roam," use the fuzzy search:
+
+`roam~`
+
+This search will match terms like roams, foam, & foams. It will also match the word "roam" itself.
+
+An optional distance parameter specifies the maximum number of edits allowed, between 0 and 2, defaulting to 2. For example:
+
+`roam~1`
+
+This will match terms like roams & foam - but not foams since it has an edit distance of "2".
+
+[IMPORTANT]
+====
+
+In many cases, stemming (reducing terms to a common stem) can produce similar effects to fuzzy searches and wildcard searches.
+
+====
+
+[[TheStandardQueryParser-ProximitySearches]]
+=== Proximity Searches
+
+A proximity search looks for terms that are within a specific distance from one another.
+
+To perform a proximity search, add the tilde character ~ and a numeric value to the end of a search phrase. For example, to search for a "apache" and "jakarta" within 10 words of each other in a document, use the search:
+
+`"jakarta apache"~10`
+
+The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if "apache" and "jakarta" were 10 spaces apart in a field, but "apache" appeared before "jakarta", more than 10 term movements would be required to move the terms together and position "apache" to the right of "jakarta" with a space in between.
+
+[[TheStandardQueryParser-RangeSearches]]
+=== Range Searches
+
+A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose `popularity` field has a value between 52 and 10,000, inclusive.
+
+`popularity:[52 TO 10000]`
+
+Range queries are not limited to date fields or even numerical fields. You could also use range queries with non-date fields:
+
+`title:{Aida TO Carmen}`
+
+This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.
+
+The brackets around a query determine its inclusiveness.
+
+* Square brackets `[` & `]` denote an inclusive range query that matches values including the upper and lower bound.
+* Curly brackets `{` & `}` denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves.
+* You can mix these types so one end of the range is inclusive and the other is exclusive. Here's an example: `count:{1 TO 10]`
+
+
+[[TheStandardQueryParser-BoostingaTermwith_]]
+=== Boosting a Term with `^`
+
+Lucene/Solr provides the relevance level of matching documents based on the terms found. To boost a term use the caret symbol `^` with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
+
+Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for
+
+"jakarta apache" and you want the term "jakarta" to be more relevant, you can boost it by adding the ^ symbol along with the boost factor immediately after the term. For example, you could type:
+
+`jakarta^4 apache`
+
+This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:
+
+`"jakarta apache"^4 "Apache Lucene"`
+
+By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, it could be 0.2).
+
+
+[[TheStandardQueryParser-ConstantScorewith_]]
+=== Constant Score with `^=`
+
+Constant score queries are created with `<query_clause>^=<score>`, which sets the entire clause to the specified score for any documents matching that clause. This is desirable when you only care about matches for a particular clause and don't want other relevancy factors such as term frequency (the number of times the term appears in the field) or inverse document frequency (a measure across the whole index for how rare a term is in a field).
+
+Example:
+
+[source,text]
+(description:blue OR color:blue)^=1.0 text:shoes
+
+
+[[TheStandardQueryParser-SpecifyingFieldsinaQuerytotheStandardQueryParser]]
+== Specifying Fields in a Query to the Standard Query Parser
+
+Data indexed in Solr is organized in fields, which are <<defining-fields.adoc#defining-fields,defined in the Solr Schema>>. Searches can take advantage of fields to add precision to queries. For example, you can search for a term only in a specific field, such as a title field.
+
+The Schema defines one field as a default field. If you do not specify a field in a query, Solr searches only the default field. Alternatively, you can specify a different field or a combination of fields in a query.
+
+To specify a field, type the field name followed by a colon ":" and then the term you are searching for within the field.
+
+For example, suppose an index contains two fields, title and text,and that text is the default field. If you want to find a document called "The Right Way" which contains the text "don't go this way," you could include either of the following terms in your search query:
+
+`title:"The Right Way" AND text:go`
+
+`title:"Do it right" AND go`
+
+Since text is the default field, the field indicator is not required; hence the second query above omits it.
+
+The field is only valid for the term that it directly precedes, so the query `title:Do it right` will find only "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field).
+
+[[TheStandardQueryParser-BooleanOperatorsSupportedbytheStandardQueryParser]]
+== Boolean Operators Supported by the Standard Query Parser
+
+Boolean operators allow you to apply Boolean logic to queries, requiring the presence or absence of specific terms or conditions in fields in order to match documents. The table below summarizes the Boolean operators supported by the standard query parser.
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,40,40",options="header"]
+|===
+|Boolean Operator |Alternative Symbol |Description
+|AND |`&&` |Requires both terms on either side of the Boolean operator to be present for a match.
+|NOT |`!` |Requires that the following term not be present.
+|OR |`||` |Requires that either term (or both terms) be present for a match.
+| |`+` |Requires that the following term be present.
+| |`-` |Prohibits the following term (that is, matches on fields or documents that do not include that term). The `-` operator is functionally similar to the Boolean operator `!`. Because it's used by popular search engines such as Google, it may be more familiar to some user communities.
+|===
+
+Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "`+`", OR, NOT and "`-`" as Boolean operators.
+
+[IMPORTANT]
+====
+
+When specifying Boolean operators with keywords such as AND or NOT, the keywords must appear in all uppercase.
+
+====
+
+[NOTE]
+====
+
+The standard query parser supports all the Boolean operators listed in the table above. The DisMax query parser supports only `+` and `-`.
+
+====
+
+The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
+
+To search for documents that contain either "jakarta apache" or just "jakarta," use the query:
+
+`"jakarta apache" jakarta`
+
+or
+
+`"jakarta apache" OR jakarta`
+
+
+[[TheStandardQueryParser-TheBooleanOperator_]]
+=== The Boolean Operator `+`
+
+The `+` symbol (also known as the "required" operator) requires that the term after the `+` symbol exist somewhere in a field in at least one document in order for the query to return a match.
+
+For example, to search for documents that must contain "jakarta" and that may or may not contain "lucene," use the following query:
+
+`+jakarta lucene`
+
+[NOTE]
+====
+
+This operator is supported by both the standard query parser and the DisMax query parser.
+
+====
+
+
+[[TheStandardQueryParser-TheBooleanOperatorAND_]]
+=== The Boolean Operator AND (`&&`)
+
+The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol `&&` can be used in place of the word AND.
+
+To search for documents that contain "jakarta apache" and "Apache Lucene," use either of the following queries:
+
+`"jakarta apache" AND "Apache Lucene"`
+
+`"jakarta apache" && "Apache Lucene"`
+
+
+[[TheStandardQueryParser-TheBooleanOperatorNOT_]]
+=== The Boolean Operator NOT (`!`)
+
+The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol `!` can be used in place of the word NOT.
+
+The following queries search for documents that contain the phrase "jakarta apache" but do not contain the phrase "Apache Lucene":
+
+`"jakarta apache" NOT "Apache Lucene"`
+
+`"jakarta apache" ! "Apache Lucene"`
+
+[[TheStandardQueryParser-TheBooleanOperator-]]
+=== The Boolean Operator `-`
+
+The `-` symbol or "prohibit" operator excludes documents that contain the term after the `-` symbol.
+
+For example, to search for documents that contain "jakarta apache" but not "Apache Lucene," use the following query:
+
+`"jakarta apache" -"Apache Lucene"`
+
+[[TheStandardQueryParser-EscapingSpecialCharacters]]
+=== Escaping Special Characters
+
+Solr gives the following characters special meaning when they appear in a query:
+
+`+` `-` `&&` `||` `!` `(` `)` `{` `}` `[` `]` `^` `"` `~` `*` `?` `:` `/`
+
+To make Solr interpret any of these characters literally, rather as a special character, precede the character with a backslash character `\`. For example, to search for (1+1):2 without having Solr interpret the plus sign and parentheses as special characters for formulating a sub-query with two terms, escape the characters by preceding each one with a backslash:
+
+[source,plain]
+----
+\(1\+1\)\:2
+----
+
+[[TheStandardQueryParser-GroupingTermstoFormSub-Queries]]
+== Grouping Terms to Form Sub-Queries
+
+Lucene/Solr supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query.
+
+The query below searches for either "jakarta" or "apache" and "website":
+
+`(jakarta OR apache) AND website`
+
+This adds precision to the query, requiring that the term "website" exist, along with either term "jakarta" and "apache."
+
+[[TheStandardQueryParser-GroupingClauseswithinaField]]
+=== Grouping Clauses within a Field
+
+To apply two or more Boolean operators to a single field in a search, group the Boolean clauses within parentheses. For example, the query below searches for a title field that contains both the word "return" and the phrase "pink panther":
+
+`title:(+return +"pink panther")`
+
+[[TheStandardQueryParser-Comments]]
+== Comments
+
+C-Style comments are supported in query strings.
+
+Example:
+
+`"jakarta apache" /* this is a comment in the middle of a normal query string */ OR jakarta`
+
+Comments may be nested.
+
+[[TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser]]
+== Differences between Lucene Query Parser and the Solr Standard Query Parser
+
+Solr's standard query parser differs from the Lucene Query Parser in the following ways:
+
+* A * may be used for either or both endpoints to specify an open-ended range query
+** `field:[* TO 100]` finds all field values less than or equal to 100
+** `field:[100 TO *]` finds all field values greater than or equal to 100
+** `field:[* TO *]` matches all documents with the field
+* Pure negative queries (all clauses prohibited) are allowed (only as a top-level clause)
+** `-inStock:false` finds all field values where inStock is not false
+** `-field:[* TO *]` finds all documents without a value for field
+* A hook into FunctionQuery syntax. You'll need to use quotes to encapsulate the function if it includes parentheses, as shown in the second example below:
+** `_val_:myfield`
+** `_val_:"recip(rord(myfield),1,2,3)"`
+* Support for using any type of query parser as a nested clause.
+** `inStock:true OR {!dismax qf='name manu' v='ipod'}`
+* Support for a special `filter(...)` syntax to indicate that some query clauses should be cached in the filter cache (as a constant score boolean query). This allows sub-queries to be cached and re-used in other queries. For example `inStock:true` will be cached and re-used in all three of the queries below:
+** `q=features:songs OR filter(inStock:true)`
+** `q=+manu:Apple +filter(inStock:true)`
+** `q=+manu:Apple & fq=inStock:true`
++
+This can even be used to cache individual clauses of complex filter queries. In the first query below, 3 items will be added to the filter cache (the top level `fq` and both `filter(...)` clauses) and in the second query, there will be 2 cache hits, and one new cache insertion (for the new top level `fq`):
+** `q=features:songs & fq=+filter(inStock:true) +filter(price:[* TO 100])`
+** `q=manu:Apple & fq=-filter(inStock:true) -filter(price:[* TO 100])`
+* Range queries ("[a TO z]"), prefix queries ("a*"), and wildcard queries ("a*b") are constant-scoring (all matching documents get an equal score). The scoring factors TF, IDF, index boost, and "coord" are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
+* Constant score queries are created with `<query_clause>^=<score>`, which sets the entire clause to the specified score for any documents matching that clause:
+** `q=(description:blue color:blue)^=1.0 title:blue^=5.0`
+
+[[TheStandardQueryParser-SpecifyingDatesandTimes]]
+=== Specifying Dates and Times
+
+Queries against fields using the `TrieDateField` type (typically range queries) should use the <<working-with-dates.adoc#working-with-dates,appropriate date syntax>>:
+
+* `timestamp:[* TO NOW]`
+* `createdate:[1976-03-06T23:59:59.999Z TO *]`
+* `createdate:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z]`
+* `pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]`
+* `createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]`
+* `createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]`
+
+[[TheStandardQueryParser-RelatedTopics]]
+== Related Topics
+
+* <<local-parameters-in-queries.adoc#local-parameters-in-queries,Local Parameters in Queries>>
+* <<other-parsers.adoc#other-parsers,Other Parsers>>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/the-stats-component.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/the-stats-component.adoc b/solr/solr-ref-guide/src/the-stats-component.adoc
new file mode 100644
index 0000000..c5ac9ef
--- /dev/null
+++ b/solr/solr-ref-guide/src/the-stats-component.adoc
@@ -0,0 +1,179 @@
+= The Stats Component
+:page-shortname: the-stats-component
+:page-permalink: the-stats-component.html
+
+The Stats component returns simple statistics for numeric, string, and date fields within the document set.
+
+The sample queries in this section assume you are running the "```techproducts```" example included with Solr:
+
+[source,bash]
+----
+bin/solr -e techproducts
+----
+
+[[TheStatsComponent-StatsComponentParameters]]
+== Stats Component Parameters
+
+The Stats Component accepts the following parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|stats |If **true**, then invokes the Stats component.
+|stats.field a|
+Specifies a field for which statistics should be generated. This parameter may be invoked multiple times in a query in order to request statistics on multiple fields.
+
+<<local-parameters-in-queries.adoc#local-parameters-in-queries,Local Parameters>> may be used to indicate which subset of the supported statistics should be computed, and/or that statistics should be computed over the results of an arbitrary numeric function (or query) instead of a simple field name. See the examples below.
+
+|stats.facet a|
+Returns sub-results for values within the specified facet.
+
+This legacy parameter is not recommended for new users - instead please consider <<TheStatsComponent-TheStatsComponentandFaceting,combining `stats.field` with `facet.pivot`>>
+
+|stats.calcdistinct a|
+If **true**, the "countDistinct" and "distinctValues" statistics will be computed and included the response. These calculations can be very expensive for fields that do not have a tiny cardinality, so they are disabled by default.
+
+This parameter can be specified using per-filed override (ie: `f.<field>.stats.calcdistinct=true`) but users are encouraged to instead the statistics desired <<TheStatsComponent-LocalParameters,as Local Parameter>> - As a top level request parameter, this option is deprecated.
+
+|===
+
+[[TheStatsComponent-Example]]
+=== Example
+
+The query below demonstrates computing stats against two different fields numeric fields, as well as stats over the results of a `termfreq()` function call using the `text` field:
+
+`\http://localhost:8983/solr/techproducts/select?q=*:*&stats=true&stats.field={!func}termfreq('text','memory')&stats.field=price&stats.field=popularity&rows=0&indent=true`
+
+[source,xml]
+----
+<lst name="stats">
+  <lst name="stats_fields">
+    <lst name="termfreq(text,memory)">
+      <double name="min">0.0</double>
+      <double name="max">3.0</double>
+      <long name="count">32</long>
+      <long name="missing">0</long>
+      <double name="sum">10.0</double>
+      <double name="sumOfSquares">22.0</double>
+      <double name="mean">0.3125</double>
+      <double name="stddev">0.7803018439949604</double>
+      <lst name="facets"/>
+    </lst>
+    <lst name="price">
+      <double name="min">0.0</double>
+      <double name="max">2199.0</double>
+      <long name="count">16</long>
+      <long name="missing">16</long>
+      <double name="sum">5251.270030975342</double>
+      <double name="sumOfSquares">6038619.175900028</double>
+      <double name="mean">328.20437693595886</double>
+      <double name="stddev">536.3536996709846</double>
+      <lst name="facets"/>
+    </lst>
+    <lst name="popularity">
+      <double name="min">0.0</double>
+      <double name="max">10.0</double>
+      <long name="count">15</long>
+      <long name="missing">17</long>
+      <double name="sum">85.0</double>
+      <double name="sumOfSquares">603.0</double>
+      <double name="mean">5.666666666666667</double>
+      <double name="stddev">2.943920288775949</double>
+      <lst name="facets"/>
+    </lst>
+  </lst>
+</lst>
+----
+
+[[TheStatsComponent-StatisticsSupported]]
+== Statistics Supported
+
+The table below explains the statistics supported by the Stats component. Not all statistics are supported for all field types, and not all statistics are computed by default (See <<TheStatsComponent-LocalParameters,Local Parameters>> below for details)
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="10,10,50,20,10",options="header"]
+|===
+|Local Param |Sample Input |Description |Supported Types |Computed by Default
+|min |true |The minimum value of the field/function in all documents in the set. |All |Yes
+|max |true |The maximum value of the field/function in all documents in the set. |All |Yes
+|sum |true |The sum of all values of the field/function in all documents in the set. |Numeric & Date |Yes
+|count |true |The number of values found in all documents in the set for this field/function. |All |Yes
+|missing |true |The number of documents in the set which do not have a value for this field/function. |All |Yes
+|sumOfSquares |true |Sum of all values squared (a by product of computing stddev) |Numeric & Date |Yes
+|mean |true |The average `(v1 + v2 .... + vN)/N` |Numeric & Date |Yes
+|stddev |true |Standard deviation, measuring how widely spread the values in the data set are. |Numeric & Date |Yes
+|percentiles |"1,99,99.9" |A list of percentile values based on cut-off points specified by the param value. These values are an approximation, using the https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[t-digest algorithm]. |Numeric |No
+|distinctValues |true |The set of all distinct values for the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. |All |No
+|countDistinct |true |The exact number of distinct values in the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. |All |No
+|cardinality |"true" or"0.3" |A statistical approximation (currently using the https://en.wikipedia.org/wiki/HyperLogLog[HyperLogLog] algorithm) of the number of distinct values in the field/function in all of the documents in the set. This calculation is much more efficient then using the 'countDistinct' option, but may not be 100% accurate. Input for this option can be floating point number between 0.0 and 1.0 indicating how aggressively the algorithm should try to be accurate: 0.0 means use as little memory as possible; 1.0 means use as much memory as needed to be as accurate as possible. 'true' is supported as an alias for "0.3" |All |No
+|===
+
+[[TheStatsComponent-LocalParameters]]
+== Local Parameters
+
+Similar to the <<faceting.adoc#faceting,Facet Component>>, the `stats.field` parameter supports local parameters for:
+
+* Tagging & Excluding Filters: `stats.field={!ex=filterA}price`
+* Changing the Output Key: `stats.field={!key=my_price_stats}price`
+* Tagging stats for <<TheStatsComponent-TheStatsComponentandFaceting,use with `facet.pivot`>>: `stats.field={!tag=my_pivot_stats}price`
+
+Local parameters can also be used to specify individual statistics by name, overriding the set of statistics computed by default, eg: `stats.field={!min=true max=true percentiles='99,99.9,99.99'}price`
+
+[IMPORTANT]
+====
+If any supported statistics are specified via local parameters, then the entire set of default statistics is overridden and only the requested statistics are computed.
+====
+
+Additional "Expert" local params are supported in some cases for affecting the behavior of some statistics:
+
+* `percentiles`
+** `tdigestCompression` - a positive numeric value defaulting to `100.0` controlling the compression factor of the T-Digest. Larger values means more accuracy, but also uses more memory.
+* `cardinality`
+** `hllPreHashed` - a boolean option indicating that the statistics are being computed over a "long" field that has already been hashed at index time – allowing the HLL computation to skip this step.
+** `hllLog2m` - an integer value specifying an explicit "log2m" value to use, overriding the heuristic value determined by the cardinality local param and the field type – see the https://github.com/aggregateknowledge/java-hll/[java-hll] documentation for more details
+** `hllRegwidth` - an integer value specifying an explicit "regwidth" value to use, overriding the heuristic value determined by the cardinality local param and the field type – see the https://github.com/aggregateknowledge/java-hll/[java-hll] documentation for more details
+* `calcDistinct` - for backwards compatibility, `calcDistinct=true` may be specified as an alias for both `countDistinct=true distinctValues=true`
+
+[[TheStatsComponent-Examples]]
+=== Examples
+
+Here we compute some statistics for the price field. The min, max, mean, 90th, and 99th percentile price values are computed against all products that are in stock (`q=*:*` and `fq=inStock:true`), and independently all of the default statistics are computed against all products regardless of whether they are in stock or not (by excluding that filter).
+
+`\http://localhost:8983/solr/techproducts/select?q=*:*&fq={!tag=stock_check}inStock:true&stats=true&stats.field={!ex=stock_check+key=instock_prices+min=true+max=true+mean=true+percentiles='90,99'}price&stats.field={!key=all_prices}price&rows=0&indent=true`
+
+[source,xml]
+----
+<lst name="stats">
+  <lst name="stats_fields">
+    <lst name="instock_prices">
+      <double name="min">0.0</double>
+      <double name="max">2199.0</double>
+      <double name="mean">328.20437693595886</double>
+      <lst name="percentiles">
+        <double name="90.0">564.9700012207031</double>
+        <double name="99.0">1966.6484985351556</double>
+      </lst>
+    </lst>
+    <lst name="all_prices">
+      <double name="min">0.0</double>
+      <double name="max">2199.0</double>
+      <long name="count">12</long>
+      <long name="missing">5</long>
+      <double name="sum">4089.880027770996</double>
+      <double name="sumOfSquares">5385249.921747174</double>
+      <double name="mean">340.823335647583</double>
+      <double name="stddev">602.3683083752779</double>
+    </lst>
+  </lst>
+</lst>
+----
+
+[[TheStatsComponent-TheStatsComponentandFaceting]]
+== The Stats Component and Faceting
+
+Although the `stats.facet` parameter is no longer recommended, sets of `stats.field` parameters can be referenced by '`tag`' when using Pivot Faceting to compute multiple statistics at every level (i.e.: field) in the tree of pivot constraints.
+
+For more information and a detailed example, please see <<faceting.adoc#Faceting-CombiningStatsComponentWithPivots,Combining Stats Component With Pivots>>.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/the-term-vector-component.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/the-term-vector-component.adoc b/solr/solr-ref-guide/src/the-term-vector-component.adoc
new file mode 100644
index 0000000..a8189c8
--- /dev/null
+++ b/solr/solr-ref-guide/src/the-term-vector-component.adoc
@@ -0,0 +1,147 @@
+= The Term Vector Component
+:page-shortname: the-term-vector-component
+:page-permalink: the-term-vector-component.html
+
+The TermVectorComponent is a search component designed to return additional information about documents matching your search.
+
+For each document in the response, the TermVectorCcomponent can return the term vector, the term frequency, inverse document frequency, position, and offset information.
+
+[[TheTermVectorComponent-Configuration]]
+== Configuration
+
+The TermVectorComponent is not enabled implicitly in Solr - it must be explicitly configured in your `solrconfig.xml` file. The examples on this page show how it is configured in Solr's "```techproducts```" example:
+
+[source,bash]
+----
+bin/solr -e techproducts
+----
+
+To enable the this component, you need to configure it using a `searchComponent` element:
+
+[source,xml]
+----
+<searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>
+----
+
+A request handler must then be configured to use this component name. In the `techproducts` example, the component is associated with a special request handler named `/tvrh`, that enables term vectors by default using the `tv=true` parameter; but you can associate it with any request handler:
+
+[source,xml]
+----
+<requestHandler name="/tvrh" class="org.apache.solr.handler.component.SearchHandler">
+  <lst name="defaults">
+    <bool name="tv">true</bool>
+  </lst>
+  <arr name="last-components">
+    <str>tvComponent</str>
+  </arr>
+</requestHandler>
+----
+
+Once your handler is defined, you may use in conjunction with any schema (that has a `uniqueKeyField)` to fetch term vectors for fields configured with the `termVector` attribute, such as in the `techproducts` sample schema.  For example:
+
+[source,xml]
+----
+<field name="includes"
+       type="text_general"
+       indexed="true"
+       stored="true"
+       multiValued="true"
+       termVectors="true"
+       termPositions="true"
+       termOffsets="true" />
+----
+
+[[TheTermVectorComponent-InvokingtheTermVectorComponent]]
+== Invoking the Term Vector Component
+
+The example below shows an invocation of this component using the above configuration:
+
+`\http://localhost:8983/solr/techproducts/tvrh?q=*:*&start=0&rows=10&fl=id,includes`
+
+[source,xml]
+----
+...
+<lst name="termVectors">
+  <lst name="GB18030TEST">
+    <str name="uniqueKey">GB18030TEST</str>
+  </lst>
+  <lst name="EN7800GTX/2DHTV/256M">
+    <str name="uniqueKey">EN7800GTX/2DHTV/256M</str>
+  </lst>
+  <lst name="100-435805">
+    <str name="uniqueKey">100-435805</str>
+  </lst>
+  <lst name="3007WFP">
+    <str name="uniqueKey">3007WFP</str>
+    <lst name="includes">
+      <lst name="cable"/>
+      <lst name="usb"/>
+    </lst>
+  </lst>
+  <lst name="SOLR1000">
+    <str name="uniqueKey">SOLR1000</str>
+  </lst>
+  <lst name="0579B002">
+    <str name="uniqueKey">0579B002</str>
+  </lst>
+  <lst name="UTF8TEST">
+    <str name="uniqueKey">UTF8TEST</str>
+  </lst>
+  <lst name="9885A004">
+    <str name="uniqueKey">9885A004</str>
+    <lst name="includes">
+      <lst name="32mb"/>
+      <lst name="av"/>
+      <lst name="battery"/>
+      <lst name="cable"/>
+      <lst name="card"/>
+      <lst name="sd"/>
+      <lst name="usb"/>
+    </lst>
+  </lst>
+  <lst name="adata">
+    <str name="uniqueKey">adata</str>
+  </lst>
+  <lst name="apple">
+    <str name="uniqueKey">apple</str>
+  </lst>
+</lst>
+----
+
+[[TheTermVectorComponent-RequestParameters]]
+=== Request Parameters
+
+The example below shows the available request parameters for this component:
+
+`\http://localhost:8983/solr/techproducts/tvrh?q=includes:[* TO *]&rows=10&indent=true&tv=true&tv.tf=true&tv.df=true&tv.positions=true&tv.offsets=true&tv.payloads=true&tv.fl=includes`
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,60,20",options="header"]
+|===
+|Boolean Parameters |Description |Type
+|tv |Should the component run or not |boolean
+|tv.docIds |Returns term vectors for the specified list of Lucene document IDs (not the Solr Unique Key). |comma seperated integers
+|tv.fl |Returns term vectors for the specified list of fields. If not specified, the `fl` parameter is used. |comma seperated list of field names
+|tv.all |A shortcut that invokes all the boolean parameters listed below. |boolean
+|tv.df |Returns the Document Frequency (DF) of the term in the collection. This can be computationally expensive. |boolean
+|tv.offsets |Returns offset information for each term in the document. |boolean
+|tv.positions |Returns position information. |boolean
+|tv.payloads |Returns payload information. |boolean
+|tv.tf |Returns document term frequency info per term in the document. |boolean
+|tv.tf_idf a|
+Calculates TF / DF (ie: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure.
+
+Requires the parameters `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output)
+
+ |boolean
+|===
+
+To learn more about TermVector component output, see the Wiki page: http://wiki.apache.org/solr/TermVectorComponentExampleOptions
+
+For schema requirements, see the Wiki page: http://wiki.apache.org/solr/FieldOptionsByUseCase
+
+[[TheTermVectorComponent-SolrJandtheTermVectorComponent]]
+== SolrJ and the Term Vector Component
+
+Neither the SolrQuery class nor the QueryResponse class offer specific method calls to set Term Vector Component parameters or get the "termVectors" output. However, there is a patch for it: https://issues.apache.org/jira/browse/SOLR-949[SOLR-949].

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/the-terms-component.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/the-terms-component.adoc b/solr/solr-ref-guide/src/the-terms-component.adoc
new file mode 100644
index 0000000..1625be2
--- /dev/null
+++ b/solr/solr-ref-guide/src/the-terms-component.adoc
@@ -0,0 +1,295 @@
+= The Terms Component
+:page-shortname: the-terms-component
+:page-permalink: the-terms-component.html
+
+The Terms Component provides access to the indexed terms in a field and the number of documents that match each term. This can be useful for building an auto-suggest feature or any other feature that operates at the term level instead of the search or document level. Retrieving terms in index order is very fast since the implementation directly uses Lucene's TermEnum to iterate over the term dictionary.
+
+In a sense, this search component provides fast field-faceting over the whole index, not restricted by the base query or any filters. The document frequencies returned are the number of documents that match the term, including any documents that have been marked for deletion but not yet removed from the index.
+
+[[TheTermsComponent-ConfiguringtheTermsComponent]]
+== Configuring the Terms Component
+
+By default, the Terms Component is already configured in `solrconfig.xml` for each collection.
+
+[[TheTermsComponent-DefiningtheTermsComponent]]
+=== Defining the Terms Component
+
+Defining the Terms search component is straightforward: simply give it a name and use the class `solr.TermsComponent`.
+
+[source,xml]
+----
+<searchComponent name="terms" class="solr.TermsComponent"/>
+----
+
+This makes the component available for use, but by itself will not be useable until included with a request handler.
+
+[[TheTermsComponent-UsingtheTermsComponentinaRequestHandler]]
+=== Using the Terms Component in a Request Handler
+
+The terms component is included with the `/terms` request handler, which is among Solr's out-of-the-box request handlers - see <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>>.
+
+Note that the defaults for this request handler set the parameter "terms" to true, which allows terms to be returned on request. The parameter "distrib" is set to false, which allows this handler to be used only on a single Solr core.
+
+You could add this component to another handler if you wanted to, and pass "terms=true" in the HTTP request in order to get terms back. If it is only defined in a separate handler, you must use that handler when querying in order to get terms and not regular documents as results.
+
+[[TheTermsComponent-TermsComponentParameters]]
+=== Terms Component Parameters
+
+The parameters below allow you to control what terms are returned. You can also configure any of these with the request handler if you'd like to set them permanently. Or, you can add them to the query request. These parameters are:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,15,15,50",options="header"]
+|===
+|Parameter |Required |Default |Description
+|terms |No |false a|
+If set to true, enables the Terms Component. By default, the Terms Component is off.
+
+Example: `terms=true`
+
+|terms.fl |Yes |null a|
+Specifies the field from which to retrieve terms.
+
+Example: `terms.fl=title`
+
+|terms.list |No |null a|
+Fetches the document frequency for a comma delimited list of terms. Terms are always returned in index order. If '`terms.ttf`' is set to true, also returns their total term frequency. If multiple '`terms.fl`' are defined, these statistics will be returned for each term in each requested field.
+
+Example: `terms.list=termA,termB,termC`
+
+|terms.limit |No |10 a|
+Specifies the maximum number of terms to return. The default is 10. If the limit is set to a number less than 0, then no maximum limit is enforced. Although this is not required, either this parameter or `terms.upper` must be defined.
+
+Example: `terms.limit=20`
+
+|terms.lower |No |empty string a|
+Specifies the term at which to start. If not specified, the empty string is used, causing Solr to start at the beginning of the field.
+
+Example: `terms.lower=orange`
+
+|terms.lower.incl |No |true a|
+If set to true, includes the lower-bound term (specified with `terms.lower` in the result set.
+
+Example: `terms.lower.incl=false`
+
+|terms.mincount |No |null a|
+Specifies the minimum document frequency to return in order for a term to be included in a query response. Results are inclusive of the mincount (that is, >= mincount).
+
+Example: `terms.mincount=5`
+
+|terms.maxcount |No |null a|
+Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount).
+
+Example: `terms.maxcount=25`
+
+|terms.prefix |No |null a|
+Restricts matches to terms that begin with the specified string.
+
+Example: `terms.prefix=inter`
+
+|terms.raw |No |false a|
+If set to true, returns the raw characters of the indexed term, regardless of whether it is human-readable. For instance, the indexed form of numeric numbers is not human-readable.
+
+Example: `terms.raw=true`
+
+|terms.regex |No |null a|
+Restricts matches to terms that match the regular expression.
+
+Example: `terms.regex=.*pedist`
+
+|terms.regex.flag |No |null a|
+Defines a Java regex flag to use when evaluating the regular expression defined with `terms.regex`. See http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html for details of each flag. Valid options are:
+
+* case_insensitive
+* comments
+* multiline
+* literal
+* dotall
+* unicode_case
+* canon_eq
+* unix_lines
+
+Example: `terms.regex.flag=case_insensitive`
+
+|terms.stats |No |null |Include index statistics in the results. Currently returns only the *numDocs* for a collection. When combined with terms.list it provides enough information to compute idf for a list of terms.
+|terms.sort |No |count a|
+Defines how to sort the terms returned. Valid options are **count**, which sorts by the term frequency, with the highest term frequency first, or **index**, which sorts in index order.
+
+Example: `terms.sort=index`
+
+|terms.ttf |No |false a|
+If set to true, returns both 'df' (docFreq) and 'ttf' (totalTermFreq) statistics for each requested term in '`terms.list`'. In this case, the response format is:
+
+[source,xml]
+----
+<lst name="terms">
+  <lst name="field">
+    <lst name="termA">
+      <long name="df">22</long>
+      <long name="ttf">73</long>
+    </lst>
+  </lst>
+</lst>
+----
+
+|terms.upper |No |null a|
+Specifies the term to stop at. Although this parameter is not required, either this parameter or `terms.limit` must be defined.
+
+Example: `terms.upper=plum`
+
+|terms.upper.incl |No |false a|
+If set to true, the upper bound term is included in the result set. The default is false.
+
+Example: `terms.upper.incl=true`
+
+|===
+
+The output is a list of the terms and their document frequency values. See below for examples.
+
+[[TheTermsComponent-Examples]]
+== Examples
+
+All of the following sample queries work with Solr's "`bin/solr -e techproducts`" example.
+
+[[TheTermsComponent-GetTop10Terms]]
+=== Get Top 10 Terms
+
+This query requests the first ten terms in the name field: `\http://localhost:8983/solr/techproducts/terms?terms.fl=name`
+
+Results:
+
+[source,xml]
+----
+<response>
+  <lst name="responseHeader">
+    <int name="status">0</int>
+    <int name="QTime">2</int>
+  </lst>
+  <lst name="terms">
+    <lst name="name">
+      <int name="one">5</int>
+      <int name="184">3</int>
+      <int name="1gb">3</int>
+      <int name="3200">3</int>
+      <int name="400">3</int>
+      <int name="ddr">3</int>
+      <int name="gb">3</int>
+      <int name="ipod">3</int>
+      <int name="memory">3</int>
+      <int name="pc">3</int>
+    </lst>
+  </lst>
+</response>
+----
+
+
+[[TheTermsComponent-GetFirst10TermsStartingwithLetter_a_]]
+=== Get First 10 Terms Starting with Letter 'a'
+
+This query requests the first ten terms in the name field, in index order (instead of the top 10 results by document count): `\http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.lower=a&terms.sort=index`
+
+Results:
+
+[source,xml]
+----
+<response>
+  <lst name="responseHeader">
+    <int name="status">0</int>
+    <int name="QTime">0</int>
+  </lst>
+  <lst name="terms">
+    <lst name="name">
+      <int name="a">1</int>
+      <int name="all">1</int>
+      <int name="apple">1</int>
+      <int name="asus">1</int>
+      <int name="ata">1</int>
+      <int name="ati">1</int>
+      <int name="belkin">1</int>
+      <int name="black">1</int>
+      <int name="british">1</int>
+      <int name="cable">1</int>
+    </lst>
+  </lst>
+</response>
+----
+
+[[TheTermsComponent-SolrJinvocation]]
+=== SolrJ Invocation
+
+[source,java]
+----
+    SolrQuery query = new SolrQuery();
+    query.setRequestHandler("/terms");
+    query.setTerms(true);
+    query.setTermsLimit(5);
+    query.setTermsLower("s");
+    query.setTermsPrefix("s");
+    query.addTermsField("terms_s");
+    query.setTermsMinCount(1);
+
+    QueryRequest request = new QueryRequest(query);
+    List<Term> terms = request.process(getSolrClient()).getTermsResponse().getTerms("terms_s");
+----
+
+[[TheTermsComponent-UsingtheTermsComponentforanAuto-SuggestFeature]]
+== Using the Terms Component for an Auto-Suggest Feature
+
+If the <<suggester.adoc#suggester,Suggester>> doesn't suit your needs, you can use the Terms component in Solr to build a similar feature for your own search application. Simply submit a query specifying whatever characters the user has typed so far as a prefix. For example, if the user has typed "at", the search engine's interface would submit the following query:
+
+`\http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.prefix=at`
+
+Result:
+
+[source,xml]
+----
+<response>
+  <lst name="responseHeader">
+    <int name="status">0</int>
+    <int name="QTime">1</int>
+  </lst>
+  <lst name="terms">
+    <lst name="name">
+      <int name="ata">1</int>
+      <int name="ati">1</int>
+    </lst>
+  </lst>
+</response>
+----
+
+You can use the parameter `omitHeader=true` to omit the response header from the query response, like in this example, which also returns the response in JSON format: `\http://localhost:8983/solr/techproducts/terms?terms.fl=name&terms.prefix=at&indent=true&wt=json&omitHeader=true`
+
+Result:
+
+[source,json]
+----
+{
+  "terms": {
+    "name": [
+      "ata",
+      1,
+      "ati",
+      1
+    ]
+  }
+}
+----
+
+[[TheTermsComponent-DistributedSearchSupport]]
+== Distributed Search Support
+
+The TermsComponent also supports distributed indexes. For the `/terms` request handler, you must provide the following two parameters:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
+|===
+|Parameter |Description
+|shards |Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <<distributed-search-with-index-sharding.adoc#distributed-search-with-index-sharding,Distributed Search with Index Sharding>>.
+|shards.qt |Specifies the request handler Solr uses for requests to shards.
+|===
+
+[[TheTermsComponent-MoreResources]]
+== More Resources
+
+* {solr-javadocs}/solr-core/org/apache/solr/handler/component/TermsComponent.html[TermsComponent javadoc]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc b/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc
new file mode 100644
index 0000000..d89019a
--- /dev/null
+++ b/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc
@@ -0,0 +1,27 @@
+= The Well-Configured Solr Instance
+:page-shortname: the-well-configured-solr-instance
+:page-permalink: the-well-configured-solr-instance.html
+:page-children: configuring-solrconfig-xml, solr-cores-and-solr-xml, configuration-apis, implicit-requesthandlers, solr-plugins, jvm-settings
+
+This section tells you how to fine-tune your Solr instance for optimum performance.
+
+This section covers the following topics:
+
+<<configuring-solrconfig-xml.adoc#configuring-solrconfig-xml,Configuring solrconfig.xml>>: Describes how to work with the main configuration file for Solr, `solrconfig.xml`, covering the major sections of the file.
+
+<<solr-cores-and-solr-xml.adoc#solr-cores-and-solr-xml,Solr Cores and solr.xml>>: Describes how to work with `solr.xml` and `core.properties` to configure your Solr core, or multiple Solr cores within a single instance.
+
+<<configuration-apis.adoc#configuration-apis,Configuration APIs>>: Describes several APIs used to configure Solr: Blob Store, Config, Request Parameters and Managed Resources.
+
+<<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>>: Describes various end-points automatically provided by Solr and how to configure them.
+
+<<solr-plugins.adoc#solr-plugins,Solr Plugins>>: Introduces Solr plugins with pointers to more information.
+
+<<jvm-settings.adoc#jvm-settings,JVM Settings>>: Gives some guidance on best practices for working with Java Virtual Machines.
+
+[IMPORTANT]
+====
+
+The focus of this section is generally on configuring a single Solr instance, but for those interested in scaling a Solr implementation in a cluster environment, see also the section <<solrcloud.adoc#solrcloud,SolrCloud>>. There are also options to scale through sharding or replication, described in the section <<legacy-scaling-and-distribution.adoc#legacy-scaling-and-distribution,Legacy Scaling and Distribution>>.
+
+====

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/thread-dump.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/thread-dump.adoc b/solr/solr-ref-guide/src/thread-dump.adoc
new file mode 100644
index 0000000..e42d336
--- /dev/null
+++ b/solr/solr-ref-guide/src/thread-dump.adoc
@@ -0,0 +1,33 @@
+= Thread Dump
+:page-shortname: thread-dump
+:page-permalink: thread-dump.html
+
+The Thread Dump screen lets you inspect the currently active threads on your server.
+
+Each thread is listed and access to the stacktraces is available where applicable. Icons to the left indicate the state of the thread: for example, threads with a green check-mark in a green circle are in a "RUNNABLE" state. On the right of the thread name, a down-arrow means you can expand to see the stacktrace for that thread.
+
+.List of Threads
+image::images/thread-dump/thread_dump_1.png[image,width=484,height=250]
+
+
+When you move your cursor over a thread name, a box floats over the name with the state for that thread. Thread states can be:
+
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="25,75",options="header"]
+|===
+|State |Meaning
+|NEW |A thread that has not yet started.
+|RUNNABLE |A thread executing in the Java virtual machine.
+|BLOCKED |A thread that is blocked waiting for a monitor lock.
+|WAITING |A thread that is waiting indefinitely for another thread to perform a particular action.
+|TIMED_WAITING |A thread that is waiting for another thread to perform an action for up to a specified waiting time.
+|TERMINATED |A thread that has exited.
+|===
+
+When you click on one of the threads that can be expanded, you'll see the stacktrace, as in the example below:
+
+.Inspecting a Thread
+image::images/thread-dump/thread_dump_2.png[image,width=453,height=250]
+
+You can also check the *Show all Stacktraces* button to automatically enable expansion for all threads.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/tokenizers.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/tokenizers.adoc b/solr/solr-ref-guide/src/tokenizers.adoc
new file mode 100644
index 0000000..0c0cf31
--- /dev/null
+++ b/solr/solr-ref-guide/src/tokenizers.adoc
@@ -0,0 +1,501 @@
+= Tokenizers
+:page-shortname: tokenizers
+:page-permalink: tokenizers.html
+
+Tokenizers are responsible for breaking field data into lexical units, or _tokens_.
+
+You configure the tokenizer for a text field type in `schema.xml` with a `<tokenizer>` element, as a child of `<analyzer>`:
+
+[source,xml]
+----
+<fieldType name="text" class="solr.TextField">
+  <analyzer type="index">
+    <tokenizer class="solr.StandardTokenizerFactory"/>
+    <filter class="solr.StandardFilterFactory"/>
+  </analyzer>
+</fieldType>
+----
+
+The class attribute names a factory class that will instantiate a tokenizer object when needed. Tokenizer factory classes implement the `org.apache.solr.analysis.TokenizerFactory`. A TokenizerFactory's `create()` method accepts a Reader and returns a TokenStream. When Solr creates the tokenizer it passes a Reader object that provides the content of the text field.
+
+Arguments may be passed to tokenizer factories by setting attributes on the `<tokenizer>` element.
+
+[source,xml]
+----
+<fieldType name="semicolonDelimited" class="solr.TextField">
+  <analyzer type="query">
+    <tokenizer class="solr.PatternTokenizerFactory" pattern="; "/>
+  </analyzer>
+</fieldType>
+----
+
+The following sections describe the tokenizer factory classes included in this release of Solr.
+
+For user tips about Solr's tokenizers, see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
+
+[[Tokenizers-StandardTokenizer]]
+== Standard Tokenizer
+
+This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions:
+
+* Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names.
+* The "@" character is among the set of token-splitting punctuation, so email addresses are *not* preserved as single tokens.
+
+Note that words are split at hyphens.
+
+The Standard Tokenizer supports http://unicode.org/reports/tr29/#Word_Boundaries[Unicode standard annex UAX#29] word boundaries with the following token types: `<ALPHANUM>`, `<NUM>`, `<SOUTHEAST_ASIAN>`, `<IDEOGRAPHIC>`, and `<HIRAGANA>`.
+
+*Factory class:* `solr.StandardTokenizerFactory`
+
+*Arguments:*
+
+`maxTokenLength`: (integer, default 255) Solr ignores tokens that exceed the number of characters specified by `maxTokenLength`.
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.StandardTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "Please, email john.doe@foo.com by 03-09, re: m37-xq."
+
+*Out:* "Please", "email", "john.doe", "foo.com", "by", "03", "09", "re", "m37", "xq"
+
+[[Tokenizers-ClassicTokenizer]]
+== Classic Tokenizer
+
+The Classic Tokenizer preserves the same behavior as the Standard Tokenizer of Solr versions 3.1 and previous. It does not use the http://unicode.org/reports/tr29/#Word_Boundaries[Unicode standard annex UAX#29] word boundary rules that the Standard Tokenizer uses. This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions:
+
+* Periods (dots) that are not followed by whitespace are kept as part of the token.
+
+* Words are split at hyphens, unless there is a number in the word, in which case the token is not split and the numbers and hyphen(s) are preserved.
+
+* Recognizes Internet domain names and email addresses and preserves them as a single token.
+
+*Factory class:* `solr.ClassicTokenizerFactory`
+
+*Arguments:*
+
+`maxTokenLength`: (integer, default 255) Solr ignores tokens that exceed the number of characters specified by `maxTokenLength`.
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.ClassicTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "Please, email john.doe@foo.com by 03-09, re: m37-xq."
+
+*Out:* "Please", "email", "john.doe@foo.com", "by", "03-09", "re", "m37-xq"
+
+[[Tokenizers-KeywordTokenizer]]
+== Keyword Tokenizer
+
+This tokenizer treats the entire text field as a single token.
+
+*Factory class:* `solr.KeywordTokenizerFactory`
+
+*Arguments:* None
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.KeywordTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "Please, email john.doe@foo.com by 03-09, re: m37-xq."
+
+*Out:* "Please, email john.doe@foo.com by 03-09, re: m37-xq."
+
+[[Tokenizers-LetterTokenizer]]
+== Letter Tokenizer
+
+This tokenizer creates tokens from strings of contiguous letters, discarding all non-letter characters.
+
+*Factory class:* `solr.LetterTokenizerFactory`
+
+*Arguments:* None
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.LetterTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "I can't."
+
+*Out:* "I", "can", "t"
+
+[[Tokenizers-LowerCaseTokenizer]]
+== Lower Case Tokenizer
+
+Tokenizes the input stream by delimiting at non-letters and then converting all letters to lowercase. Whitespace and non-letters are discarded.
+
+*Factory class:* `solr.LowerCaseTokenizerFactory`
+
+*Arguments:* None
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.LowerCaseTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "I just \*LOVE* my iPhone!"
+
+*Out:* "i", "just", "love", "my", "iphone"
+
+[[Tokenizers-N-GramTokenizer]]
+== N-Gram Tokenizer
+
+Reads the field text and generates n-gram tokens of sizes in the given range.
+
+*Factory class:* `solr.NGramTokenizerFactory`
+
+*Arguments:*
+
+`minGramSize`: (integer, default 1) The minimum n-gram size, must be > 0.
+
+`maxGramSize`: (integer, default 2) The maximum n-gram size, must be >= `minGramSize`.
+
+*Example:*
+
+Default behavior. Note that this tokenizer operates over the whole field. It does not break the field at whitespace. As a result, the space character is included in the encoding.
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.NGramTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "hey man"
+
+*Out:* "h", "e", "y", " ", "m", "a", "n", "he", "ey", "y ", " m", "ma", "an"
+
+*Example:*
+
+With an n-gram size range of 4 to 5:
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.NGramTokenizerFactory" minGramSize="4" maxGramSize="5"/>
+</analyzer>
+----
+
+*In:* "bicycle"
+
+*Out:* "bicy", "bicyc", "icyc", "icycl", "cycl", "cycle", "ycle"
+
+[[Tokenizers-EdgeN-GramTokenizer]]
+== Edge N-Gram Tokenizer
+
+Reads the field text and generates edge n-gram tokens of sizes in the given range.
+
+*Factory class:* `solr.EdgeNGramTokenizerFactory`
+
+*Arguments:*
+
+`minGramSize`: (integer, default is 1) The minimum n-gram size, must be > 0.
+
+`maxGramSize`: (integer, default is 1) The maximum n-gram size, must be >= `minGramSize`.
+
+`side`: ("front" or "back", default is "front") Whether to compute the n-grams from the beginning (front) of the text or from the end (back).
+
+*Example:*
+
+Default behavior (min and max default to 1):
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.EdgeNGramTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "babaloo"
+
+*Out:* "b"
+
+*Example:*
+
+Edge n-gram range of 2 to 5
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="5"/>
+</analyzer>
+----
+
+*In:* "babaloo"
+
+**Out:**"ba", "bab", "baba", "babal"
+
+*Example:*
+
+Edge n-gram range of 2 to 5, from the back side:
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.EdgeNGramTokenizerFactory" minGramSize="2" maxGramSize="5" side="back"/>
+</analyzer>
+----
+
+*In:* "babaloo"
+
+*Out:* "oo", "loo", "aloo", "baloo"
+
+[[Tokenizers-ICUTokenizer]]
+== ICU Tokenizer
+
+This tokenizer processes multilingual text and tokenizes it appropriately based on its script attribute.
+
+You can customize this tokenizer's behavior by specifying http://userguide.icu-project.org/boundaryanalysis#TOC-RBBI-Rules[per-script rule files]. To add per-script rules, add a `rulefiles` argument, which should contain a comma-separated list of `code:rulefile` pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path. For example, to specify rules for Latin (script code "Latn") and Cyrillic (script code "Cyrl"), you would enter `Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi`.
+
+The default `solr.ICUTokenizerFactory` provides UAX#29 word break rules tokenization (like `solr.StandardTokenizer`), but also includes custom tailorings for Hebrew (specializing handling of double and single quotation marks), and for syllable tokenization for Khmer, Lao, and Myanmar.
+
+*Factory class:* `solr.ICUTokenizerFactory`
+
+*Arguments:*
+
+`rulefile`: a comma-separated list of `code:rulefile` pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path.
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <!-- no customization -->
+  <tokenizer class="solr.ICUTokenizerFactory"/>
+</analyzer>
+----
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.ICUTokenizerFactory"
+             rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
+</analyzer>
+----
+
+[IMPORTANT]
+====
+
+To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
+
+====
+
+[[Tokenizers-PathHierarchyTokenizer]]
+== Path Hierarchy Tokenizer
+
+This tokenizer creates synonyms from file path hierarchies.
+
+*Factory class:* `solr.PathHierarchyTokenizerFactory`
+
+*Arguments:*
+
+`delimiter`: (character, no default) You can specify the file path delimiter and replace it with a delimiter you provide. This can be useful for working with backslash delimiters.
+
+`replace`: (character, no default) Specifies the delimiter character Solr uses in the tokenized output.
+
+*Example:*
+
+[source,xml]
+----
+<fieldType name="text_path" class="solr.TextField" positionIncrementGap="100">
+  <analyzer>
+    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="\" replace="/"/>
+  </analyzer>
+</fieldType>
+----
+
+*In:* "c:\usr\local\apache"
+
+*Out:* "c:", "c:/usr", "c:/usr/local", "c:/usr/local/apache"
+
+[[Tokenizers-RegularExpressionPatternTokenizer]]
+== Regular Expression Pattern Tokenizer
+
+This tokenizer uses a Java regular expression to break the input text stream into tokens. The expression provided by the pattern argument can be interpreted either as a delimiter that separates tokens, or to match patterns that should be extracted from the text as tokens.
+
+See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[the Javadocs for `java.util.regex.Pattern`] for more information on Java regular expression syntax.
+
+*Factory class:* `solr.PatternTokenizerFactory`
+
+*Arguments:*
+
+`pattern`: (Required) The regular expression, as defined by in `java.util.regex.Pattern`.
+
+`group`: (Optional, default -1) Specifies which regex group to extract as the token(s). The value -1 means the regex should be treated as a delimiter that separates tokens. Non-negative group numbers (>= 0) indicate that character sequences matching that regex group should be converted to tokens. Group zero refers to the entire regex, groups greater than zero refer to parenthesized sub-expressions of the regex, counted from left to right.
+
+*Example:*
+
+A comma separated list. Tokens are separated by a sequence of zero or more spaces, a comma, and zero or more spaces.
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*"/>
+</analyzer>
+----
+
+*In:* "fee,fie, foe , fum, foo"
+
+*Out:* "fee", "fie", "foe", "fum", "foo"
+
+*Example:*
+
+Extract simple, capitalized words. A sequence of at least one capital letter followed by zero or more letters of either case is extracted as a token.
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.PatternTokenizerFactory" pattern="[A-Z][A-Za-z]*" group="0"/>
+</analyzer>
+----
+
+*In:* "Hello. My name is Inigo Montoya. You killed my father. Prepare to die."
+
+*Out:* "Hello", "My", "Inigo", "Montoya", "You", "Prepare"
+
+*Example:*
+
+Extract part numbers which are preceded by "SKU", "Part" or "Part Number", case sensitive, with an optional semi-colon separator. Part numbers must be all numeric digits, with an optional hyphen. Regex capture groups are numbered by counting left parenthesis from left to right. Group 3 is the subexpression "[0-9-]+", which matches one or more digits or hyphens.
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.PatternTokenizerFactory" pattern="(SKU|Part(\sNumber)?):?\s(\[0-9-\]+)" group="3"/>
+</analyzer>
+----
+
+*In:* "SKU: 1234, Part Number 5678, Part: 126-987"
+
+*Out:* "1234", "5678", "126-987"
+
+[[Tokenizers-SimplifiedRegularExpressionPatternTokenizer]]
+== Simplified Regular Expression Pattern Tokenizer
+
+This tokenizer is similar to the `PatternTokenizerFactory` described above, but uses Lucene {lucene-javadocs}/core/org/apache/lucene/util/automaton/RegExp.html[`RegExp`] pattern matching to construct distinct tokens for the input stream. The syntax is more limited than `PatternTokenizerFactory`, but the tokenization is quite a bit faster.
+
+*Factory class:* `solr.SimplePatternTokenizerFactory`
+
+*Arguments:*
+
+`pattern`: (Required) The regular expression, as defined by in the {lucene-javadocs}/core/org/apache/lucene/util/automaton/RegExp.html[`RegExp`] javadocs, identifying the characters to include in tokens. The matching is greedy such that the longest token matching at a given point is created. Empty tokens are never created.
+
+`maxDeterminizedStates`: (Optional, default 10000) the limit on total state count for the determined automaton computed from the regexp.
+
+*Example:*
+
+To match tokens delimited by simple whitespace characters:
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^ \t\r\n]+"/>
+</analyzer>
+----
+
+[[Tokenizers-SimplifiedRegularExpressionPatternSplittingTokenizer]]
+== Simplified Regular Expression Pattern Splitting Tokenizer
+
+This tokenizer is similar to the `SimplePatternTokenizerFactory` described above, but uses Lucene {lucene-javadocs}/core/org/apache/lucene/util/automaton/RegExp.html[`RegExp`] pattern matching to identify sequences of characters that should be used to split tokens. The syntax is more limited than `PatternTokenizerFactory`, but the tokenization is quite a bit faster.
+
+*Factory class:* `solr.SimplePatternSplitTokenizerFactory`
+
+*Arguments:*
+
+`pattern`: (Required) The regular expression, as defined by in the {lucene-javadocs}/core/org/apache/lucene/util/automaton/RegExp.html[`RegExp`] javadocs, identifying the characters that should split tokens. The matching is greedy such that the longest token separator matching at a given point is matched. Empty tokens are never created.
+
+`maxDeterminizedStates`: (Optional, default 10000) the limit on total state count for the determined automaton computed from the regexp.
+
+*Example:*
+
+To match tokens delimited by simple whitespace characters:
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/>
+</analyzer>
+----
+
+[[Tokenizers-UAX29URLEmailTokenizer]]
+== UAX29 URL Email Tokenizer
+
+This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions:
+
+* Periods (dots) that are not followed by whitespace are kept as part of the token.
+
+* Words are split at hyphens, unless there is a number in the word, in which case the token is not split and the numbers and hyphen(s) are preserved.
+
+* Recognizes and preserves as single tokens the following:
+** Internet domain names containing top-level domains validated against the white list in the http://www.internic.net/zones/root.zone[IANA Root Zone Database] when the tokenizer was generated
+** email addresses
+** `file://`, `http(s)://`, and `ftp://` URLs
+** IPv4 and IPv6 addresses
+
+The UAX29 URL Email Tokenizer supports http://unicode.org/reports/tr29/#Word_Boundaries[Unicode standard annex UAX#29] word boundaries with the following token types: `<ALPHANUM>`, `<NUM>`, `<URL>`, `<EMAIL>`, `<SOUTHEAST_ASIAN>`, `<IDEOGRAPHIC>`, and `<HIRAGANA>`.
+
+*Factory class:* `solr.UAX29URLEmailTokenizerFactory`
+
+*Arguments:*
+
+`maxTokenLength`: (integer, default 255) Solr ignores tokens that exceed the number of characters specified by `maxTokenLength`.
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
+</analyzer>
+----
+
+*In:* "Visit http://accarol.com/contact.htm?from=external&a=10 or e-mail bob.cratchet@accarol.com"
+
+*Out:* "Visit", "http://accarol.com/contact.htm?from=external&a=10", "or", "e", "mail", "bob.cratchet@accarol.com"
+
+[[Tokenizers-WhiteSpaceTokenizer]]
+== White Space Tokenizer
+
+Simple tokenizer that splits the text stream on whitespace and returns sequences of non-whitespace characters as tokens. Note that any punctuation _will_ be included in the tokens.
+
+*Factory class:* `solr.WhitespaceTokenizerFactory`
+
+*Arguments:* `rule` : Specifies how to define whitespace for the purpose of tokenization. Valid values:
+
+* `java`: (Default) Uses https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-[Character.isWhitespace(int)]
+* `unicode`: Uses Unicode's WHITESPACE property
+
+*Example:*
+
+[source,xml]
+----
+<analyzer>
+  <tokenizer class="solr.WhitespaceTokenizerFactory" rule="java" />
+</analyzer>
+----
+
+*In:* "To be, or what?"
+
+*Out:* "To", "be,", "or", "what?"

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/c8c2aab8/solr/solr-ref-guide/src/transforming-and-indexing-custom-json.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/transforming-and-indexing-custom-json.adoc b/solr/solr-ref-guide/src/transforming-and-indexing-custom-json.adoc
new file mode 100644
index 0000000..8db3123
--- /dev/null
+++ b/solr/solr-ref-guide/src/transforming-and-indexing-custom-json.adoc
@@ -0,0 +1,362 @@
+= Transforming and Indexing Custom JSON
+:page-shortname: transforming-and-indexing-custom-json
+:page-permalink: transforming-and-indexing-custom-json.html
+
+If you have JSON documents that you would like to index without transforming them into Solr's structure, you can add them to Solr by including some parameters with the update request. These parameters provide information on how to split a single JSON file into multiple Solr documents and how to map fields to Solr's schema. One or more valid JSON documents can be sent to the `/update/json/docs` path with the configuration params.
+
+[[TransformingandIndexingCustomJSON-MappingParameters]]
+== Mapping Parameters
+
+These parameters allow you to define how a JSON file should be read for multiple Solr documents.
+
+* **split**: Defines the path at which to split the input JSON into multiple Solr documents and is required if you have multiple documents in a single JSON file. If the entire JSON makes a single solr document, the path must be “`/`”. It is possible to pass multiple split paths by separating them with a pipe `(|)` example : `split=/|/foo|/foo/bar` . If one path is a child of another, they automatically become a child document **f**: This is a multivalued mapping parameter. The format of the parameter is` target-field-name:json-path`. The `json-path` is required. The `target-field-name` is the Solr document field name, and is optional. If not specified, it is automatically derived from the input JSON.The default target field name is the fully qualified name of the field. Wildcards can be used here, see the <<TransformingandIndexingCustomJSON-Wildcards,Wildcards>> below for more information.
+* *mapUniqueKeyOnly* (boolean): This parameter is particularly convenient when the fields in the input JSON are not available in the schema and <<schemaless-mode.adoc#schemaless-mode,schemaless mode>> is not enabled. This will index all the fields into the default search field (using the `df` parameter, below) and only the `uniqueKey` field is mapped to the corresponding field in the schema. If the input JSON does not have a value for the `uniqueKey` field then a UUID is generated for the same.
+* **df**: If the `mapUniqueKeyOnly` flag is used, the update handler needs a field where the data should be indexed to. This is the same field that other handlers use as a default search field.
+* **srcField**: This is the name of the field to which the JSON source will be stored into. This can only be used if `split=/` (i.e., you want your JSON input file to be indexed as a single Solr document). Note that atomic updates will cause the field to be out-of-sync with the document.
+* **echo**: This is for debugging purpose only. Set it to true if you want the docs to be returned as a response. Nothing will be indexed.
+
+For example, if we have a JSON file that includes two documents, we could define an update request like this:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
+'?split=/exams'\
+'&f=first:/first'\
+'&f=last:/last'\
+'&f=grade:/grade'\
+'&f=subject:/exams/subject'\
+'&f=test:/exams/test'\
+'&f=marks:/exams/marks'\
+ -H 'Content-type:application/json' -d '
+{
+  "first": "John",
+  "last": "Doe",
+  "grade": 8,
+  "exams": [
+    {
+      "subject": "Maths",
+      "test"   : "term1",
+      "marks"  : 90},
+    {
+      "subject": "Biology",
+      "test"   : "term1",
+      "marks"  : 86}
+  ]
+}'
+----
+
+You can store and reuse the params by using <<request-parameters-api.adoc#request-parameters-api,Request Parameters>>.
+
+[source,bash]
+----
+ curl http://localhost:8983/solr/my_collection/config/params -H 'Content-type:application/json' -d '{
+ "set": {
+ "my_params": {
+ "split": "/exams",
+ "f": ["first:/first","last:/last","grade:/grade","subject:/exams/subject","test:/exams/test"]
+ }}}'
+----
+
+and use it as follows:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs?useParams=my_params' -H 'Content-type:application/json' -d '{
+"first": "John",
+"last": "Doe",
+"grade": 8,
+"exams": [
+{
+"subject": "Maths",
+"test" : "term1",
+"marks" : 90},
+{
+"subject": "Biology",
+"test" : "term1",
+"marks" : 86}
+]
+}'
+----
+
+With this request, we have defined that "exams" contains multiple documents. In addition, we have mapped several fields from the input document to Solr fields.
+
+When the update request is complete, the following two documents will be added to the index:
+
+[source,json]
+----
+{
+  "first":"John",
+  "last":"Doe",
+  "marks":90,
+  "test":"term1",
+  "subject":"Maths",
+  "grade":8
+}
+{
+  "first":"John",
+  "last":"Doe",
+  "marks":86,
+  "test":"term1",
+  "subject":"Biology",
+  "grade":8
+}
+----
+
+In the prior example, all of the fields we wanted to use in Solr had the same names as they did in the input JSON. When that is the case, we can simplify the request as follows:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
+'?split=/exams'\
+'&f=/first'\
+'&f=/last'\
+'&f=/grade'\
+'&f=/exams/subject'\
+'&f=/exams/test'\
+'&f=/exams/marks'\
+ -H 'Content-type:application/json' -d '
+{
+  "first": "John",
+  "last": "Doe",
+  "grade": 8,
+  "exams": [
+    {
+      "subject": "Maths",
+      "test"   : "term1",
+      "marks"  : 90},
+    {
+      "subject": "Biology",
+      "test"   : "term1",
+      "marks"  : 86}
+  ]
+}'
+----
+
+In this example, we simply named the field paths (such as `/exams/test`). Solr will automatically attempt to add the content of the field from the JSON input to the index in a field with the same name.
+
+[TIP]
+====
+
+If you are working in <<schemaless-mode.adoc#schemaless-mode,Schemaless Mode>>, fields that don't exist will be created on the fly with Solr's best guess for the field type. Documents WILL get rejected if the fields do not exist in the schema before indexing. So, if you are NOT using schemaless mode, pre-create those fields.
+
+====
+
+[[TransformingandIndexingCustomJSON-Wildcards]]
+== Wildcards
+
+Instead of specifying all the field names explicitly, it is possible to specify wildcards to map fields automatically. There are two restrictions: wildcards can only be used at the end of the `json-path`, and the split path cannot use wildcards. A single asterisk `\*` maps only to direct children, and a double asterisk `\*\*` maps recursively to all descendants. The following are example wildcard path mappings:
+
+* `f=$FQN:/**`: maps all fields to the fully qualified name (`$FQN`) of the JSON field. The fully qualified name is obtained by concatenating all the keys in the hierarchy with a period (`.`) as a delimiter. This is the default behavior if no `f` path mappings are specified.
+* `f=/docs/*`: maps all the fields under docs and in the name as given in json
+* `f=/docs/**`: maps all the fields under docs and its children in the name as given in json
+* `f=searchField:/docs/*` : maps all fields under /docs to a single field called ‘searchField’
+* `f=searchField:/docs/**` : maps all fields under /docs and its children to searchField
+
+With wildcards we can further simplify our previous example as follows:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
+'?split=/exams'\
+'&f=/**'\
+ -H 'Content-type:application/json' -d '
+{
+  "first": "John",
+  "last": "Doe",
+  "grade": 8,
+  "exams": [
+    {
+      "subject": "Maths",
+      "test"   : "term1",
+      "marks"  : 90},
+    {
+      "subject": "Biology",
+      "test"   : "term1",
+      "marks"  : 86}
+  ]
+}'
+----
+
+Because we want the fields to be indexed with the field names as they are found in the JSON input, the double wildcard in `f=/**` will map all fields and their descendants to the same fields in Solr.
+
+It is also possible to send all the values to a single field and do a full text search on that. This is a good option to blindly index and query JSON documents without worrying about fields and schema.
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
+'?split=/'\
+'&f=txt:/**'\
+ -H 'Content-type:application/json' -d '
+{
+  "first": "John",
+  "last": "Doe",
+  "grade": 8,
+  "exams": [
+    {
+      "subject": "Maths",
+      "test"   : "term1",
+      "marks"  : 90},
+    {
+      "subject": "Biology",
+      "test"   : "term1",
+      "marks"  : 86}
+  ]
+}' 
+----
+
+In the above example, we've said all of the fields should be added to a field in Solr named 'txt'. This will add multiple fields to a single field, so whatever field you choose should be multi-valued.
+
+The default behavior is to use the fully qualified name (FQN) of the node. So, if we don't define any field mappings, like this:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs?split=/exams'\
+    -H 'Content-type:application/json' -d '
+{
+  "first": "John",
+  "last": "Doe",
+  "grade": 8,
+  "exams": [
+    {
+      "subject": "Maths",
+      "test"   : "term1",
+      "marks"  : 90},
+    {
+      "subject": "Biology",
+      "test"   : "term1",
+      "marks"  : 86}
+  ]
+}'
+----
+
+The indexed documents would be added to the index with fields that look like this:
+
+[source,bash]
+----
+{
+  "first":"John",
+  "last":"Doe",
+  "grade":8,
+  "exams.subject":"Maths",
+  "exams.test":"term1",
+  "exams.marks":90},
+{
+  "first":"John",
+  "last":"Doe",
+  "grade":8,
+  "exams.subject":"Biology",
+  "exams.test":"term1",
+  "exams.marks":86}
+----
+
+[[TransformingandIndexingCustomJSON-MultipledocumentsinaSinglePayload]]
+== Multiple documents in a Single Payload
+
+This functionality supports documents in the http://jsonlines.org/[JSON Lines] format (`.jsonl`), which specifies one document per line.
+
+For example:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs' -H 'Content-type:application/json' -d '
+{ "first":"Steve", "last":"Jobs", "grade":1, "subject": "Social Science", "test" : "term1", "marks" : 90}
+{ "first":"Steve", "last":"Woz", "grade":1, "subject": "Political Science", "test" : "term1", "marks" : 86}'
+----
+
+Or even an array of documents, as in this example:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs' -H 'Content-type:application/json' -d '[
+{ "first":"Steve", "last":"Jobs", "grade":1, "subject": "Computer Science", "test"   : "term1", "marks"  : 90},
+{ "first":"Steve", "last":"Woz", "grade":1, "subject": "Calculus", "test"   : "term1", "marks"  : 86}]'
+----
+
+[[TransformingandIndexingCustomJSON-IndexingNestedDocuments]]
+== Indexing Nested Documents
+
+The following is an example of indexing nested documents:
+
+[source,bash]
+----
+curl 'http://localhost:8983/solr/my_collection/update/json/docs?split=/|/orgs'\
+    -H 'Content-type:application/json' -d '{
+  "name": "Joe Smith",
+  "phone": 876876687,
+  "orgs": [
+    {
+      "name": "Microsoft",
+      "city": "Seattle",
+      "zip": 98052
+    },
+    {
+      "name": "Apple",
+      "city": "Cupertino",
+      "zip": 95014
+    }
+  ]
+}'
+----
+
+With this example, the documents indexed would be, as follows:
+
+[source,json]
+----
+{
+  "name":"Joe Smith",
+  "phone":876876687,
+  "_childDocuments_":[
+    {
+      "name":"Microsoft",
+      "city":"Seattle",
+      "zip":98052},
+    {
+      "name":"Apple",
+      "city":"Cupertino",
+      "zip":95014}]}
+----
+
+[[TransformingandIndexingCustomJSON-TipsforCustomJSONIndexing]]
+== Tips for Custom JSON Indexing
+
+1.  Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
+2.  Pre-created Schema : Post your docs to the `/update/`json`/docs` endpoint with `echo=true`. This gives you the list of field names you need to create. Create the fields before you actually index
+3.  No schema, only full-text search : All you need to do is to do full-text search on your JSON. Set the configuration as given in the Setting JSON Defaults section.
+
+[[TransformingandIndexingCustomJSON-SettingJSONDefaults]]
+== Setting JSON Defaults
+
+It is possible to send any json to the `/update/json/docs` endpoint and the default configuration of the component is as follows:
+
+[source,xml]
+----
+<initParams path="/update/json/docs">
+  <lst name="defaults">
+    <!-- this ensures that the entire json doc will be stored verbatim into one field -->
+    <str name="srcField">_src_</str>
+    <!-- This means a the uniqueKeyField will be extracted from the fields and
+         all fields go into the 'df' field. In this config df is already configured to be 'text'
+     -->
+    <str name="mapUniqueKeyOnly">true</str>
+    <!-- The default search field where all the values are indexed to -->
+    <str name="df">text</str>
+  </lst>
+</initParams>
+----
+
+So, if no params are passed, the entire json file would get indexed to the `\_src_` field and all the values in the input JSON would go to a field named `text`. If there is a value for the uniqueKey it is stored and if no value could be obtained from the input JSON, a UUID is created and used as the uniqueKey field value.
+
+Alternately, use the Request Parameters feature to set these params
+
+[source,bash]
+----
+ curl http://localhost:8983/solr/my_collection/config/params -H 'Content-type:application/json' -d '{
+ "set": {
+ "full_txt": {
+     "srcField": "_src_",
+     "mapUniqueKeyOnly" : true,
+     "df": "text"
+ }}}'
+----
+
+Send the parameter `useParams=full_txt` with each request.