You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/12/12 03:57:08 UTC

[Solr Wiki] Update of "SchemaXml" by MeljeanLegaspi

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SchemaXml" page has been changed by MeljeanLegaspi.
The comment on this change is: Replaced <fieldtypes> with <types> in the Miscellaneous Settings section of the document.
http://wiki.apache.org/solr/SchemaXml?action=diff&rev1=40&rev2=41

--------------------------------------------------

  <<TableOfContents>>
  
  == Data Types ==
- 
  The `<types>` section allows you to define a list of `<fieldtype>` declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.
  
  Any subclass of [[http://lucene.apache.org/solr/docs/api/org/apache/solr/schema/FieldType.html|FieldType]] may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package.  For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.
  
-   Common options that field types can have are...
+  . Common options that field types can have are...
- 
-    * `sortMissingLast=true|false`
+   * `sortMissingLast=true|false`
-    * `sortMissingFirst=true|false`
+   * `sortMissingFirst=true|false`
-    * `indexed=true|false`
+   * `indexed=true|false`
-    * `stored=true|false`
+   * `stored=true|false`
-    * `multiValued=true|false`
+   * `multiValued=true|false`
-    * `omitNorms=true|false` 
+   * `omitNorms=true|false`
-    * `omitTermFreqAndPositions=true|false` <!> [[Solr1.4]]
+   * `omitTermFreqAndPositions=true|false` <!> [[Solr1.4]]
-    * `positionIncrementGap=N`
+   * `positionIncrementGap=N`
  
  `TextField`s can also support Analyzers with highly configurable [[AnalyzersTokenizersTokenFilters|Tokenizers and Token Filters]].
  
@@ -31, +29 @@

  
  Field types that store text (`TextField`, `StrField`) support compression of stored contents:
  
-    * `compressed=true|false`
+  * `compressed=true|false`
-    * `compressThreshold=<integer>`
+  * `compressThreshold=<integer>`
  
  `compressThreshold` is the minimum length required for text compression to be invoked.  This applies only if `compressed=true`; a common pattern is to set `compressThreshold` on the field type definition, and turn compression on and off in the individual field definitions.
  
  === Poly Field Types ===
- /!\ Solr1.5 /!\
- See https://issues.apache.org/jira/browse/SOLR-1131.  This discusses uncommitted code.
+ /!\ Solr1.5 /!\ See https://issues.apache.org/jira/browse/SOLR-1131.  This discusses uncommitted code.
  
  Some !FieldTypes can be "poly" field types.  A Poly !FieldType is one that can potentially create multiple Fields per "declared" field.  The primary example in Solr is the PointType.  Depending on the dimension specified, one or more Fields will be created.  For example:
+ 
  {{{
  <fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/>
  }}}
- 
  Declares a !FieldType that can be used to represent a point in 2 dimensions (i.e. a lat/lon).  The subFieldTypes value tells Solr what the underlying representation will be for the values in the field, in this case a !FieldType called "double".
  
  Thus, a Field declaration like:
+ 
  {{{
  <field name="store" type="location" indexed="true" stored="true"/>
  }}}
+ can be indexed like:
  
- can be indexed like:
  {{{
  <add>
  <doc>
@@ -60, +58 @@

  </doc>
  </add>
  }}}
- 
  Underneath the hood, Solr will create two fields (using dynamic fields) to store the information.
  
  == Fields ==
- 
- The `<fields>` section is where you list the individual `<field>` declarations you wish to use in your documents.  Each `<field>` has a `name` that you will use to reference it when adding documents or executing searches, and an associated `type` which identifies the name of the fieldtype you wish to use for this field. There are various field options that apply to a field. These can be set in the field type declarations, and can also be overridden at an individual field's declaration. 
+ The `<fields>` section is where you list the individual `<field>` declarations you wish to use in your documents.  Each `<field>` has a `name` that you will use to reference it when adding documents or executing searches, and an associated `type` which identifies the name of the fieldtype you wish to use for this field. There are various field options that apply to a field. These can be set in the field type declarations, and can also be overridden at an individual field's declaration.
  
  === Common field options ===
+ Common options that fields can have are...
  
- Common options that fields can have are...
   * `default`
    * The default value for this field if none is provided while adding documents
   * `indexed=true|false`
@@ -87, +83 @@

   * `omitTermFreqAndPositions=true|false` <!> [[Solr1.4]]
    * If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents.
  
- 
  See also FieldOptionsByUseCase, which discusses how these options should be set in various circumstances. See SolrPerformanceFactors for how different options can affect Solr performance.
  
  === Dynamic fields ===
- 
- One of the powerful features of Lucene is that you don't have to pre-define every field when you first create your index.  Even though Solr provides strong datatyping for fields, it still preserves that flexibility using "Dynamic Fields".  Using `<dynamicField>` declarations, you can create field rules that Solr will use to understand what datatype should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in a dynamicField.  
+ One of the powerful features of Lucene is that you don't have to pre-define every field when you first create your index.  Even though Solr provides strong datatyping for fields, it still preserves that flexibility using "Dynamic Fields".  Using `<dynamicField>` declarations, you can create field rules that Solr will use to understand what datatype should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in a dynamicField.
  
  For example the following dynamic field declaration tells Solr that whenever it sees a field name ending in "_i" which is not an explicitly defined field, then it should dynamically create an integer field with that name...
  
  {{{
      <dynamicField name="*_i"  type="integer"  indexed="true"  stored="true"/>
  }}}
- 
  === Indexing same data in multiple fields ===
- 
  Note that, with textual data, it will often make sense to take what's logically speaking a single field (e.g. product name) and index it into several different Solr fields, each with different field options and/or analyzers.
  
  As an example, if I had a field with a list of authors, such as:
  
-   ''Schildt, Herbert; Wolpert, Lewis; Davies, P.''
+  . ''Schildt, Herbert; Wolpert, Lewis; Davies, P.''
-   
+ 
- I might want to index the same data differently in three different fields (perhaps using the Solr [[SchemaXml#Copy Fields|copyField]] directive):
+ I might want to index the same data differently in three different fields (perhaps using the Solr [[#Copy_Fields|copyField]] directive):
+ 
-   * For searching: Tokenized, case-folded, punctuation-stripped:
+  * For searching: Tokenized, case-folded, punctuation-stripped:
-       schildt / herbert / wolpert / lewis / davies / p
+   . schildt / herbert / wolpert / lewis / davies / p
-   * For sorting: Untokenized, case-folded, punctuation-stripped:
+  * For sorting: Untokenized, case-folded, punctuation-stripped:
-       schildt herbert wolpert lewis davies p
+   . schildt herbert wolpert lewis davies p
-   * For faceting: Primary author only, using a `solr.StringField`:
+  * For faceting: Primary author only, using a `solr.StringField`:
-       Schildt, Herbert
+   . Schildt, Herbert
  
  (See also SolrFacetingOverview.)
  
  === Expert field options ===
+ The storage of Lucene term vectors can be triggered using the following field options:
  
- The storage of Lucene term vectors can be triggered using the following field options:
-    * `termVectors=true|false`
+  * `termVectors=true|false`
-    * `termPositions=true|false`
+  * `termPositions=true|false`
-    * `termOffsets=true|false`
+  * `termOffsets=true|false`
  
  These options can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size.  They are ''not'' necessary for typical uses of Solr (phrase queries, etc., do not require these settings to be present).
  
  == Miscellaneous Settings ==
- 
- In addition to the `<fieldtypes>` and `<fields>` sections of the schema, there are several other declarations that can appear in your schema....
+ In addition to the `<types>` and `<fields>` sections of the schema, there are several other declarations that can appear in your schema.
  
  === The Unique Key Field ===
- 
  The `<uniqueKey>` declaration can be used to inform Solr that there is a field in your index which should be unique for all documents.  If a document is added that contains the same value for this field as an existing document, the old document will be deleted.
  
  It is not mandatory for a schema to have a uniqueKey field.
  
  === The Default Search Field ===
- 
  The `<defaultSearchField>` is used by Solr when parsing queries to identify which field name should be searched in queries where an explicit field name has not been used.
  
  /!\ :TODO: /!\ check whether this option is also used by the DisMaxRequestHandler and not only by the StandardRequestHandler
  
  === Default query parser operator ===
- 
  The default operator used by Solr's query parser ([[http://lucene.apache.org/solr/docs/api/org/apache/solr/search/SolrQueryParser.html|SolrQueryParser]]) can be configured with <solrQueryParser defaultOperator="AND|OR"/>.  The default operator is "OR" if unspecified.
  
+ <<Anchor(copyField)>>
  
- <<Anchor(copyField)>>
  === Copy Fields ===
- 
  Any number of `<copyField>` declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index, in the "dest" field of that document.  You are responsible for ensuring that the datatypes of the fields are compatible. The original text is sent from the "source" field to the "dest" field, before any configured analyzers for the originating or destination field are invoked.
  
+ This is provided as a convenient way to ensure that data is put into several fields, without needing to include the data in the update command multiple times. The maxChars property may be used in a copyField declaration.   This simply limits the number of characters copied.  For example:
- This is provided as a convenient way to ensure that data is put into several fields, without needing to include the data in the update command multiple times.
- The maxChars property may be used in a copyField declaration.   This simply limits the number of characters copied.  For example:
  
  {{{
   <copyField source="body" dest="teaser" maxChars="300"/>
  }}}
+ A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-
  
- A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-
  {{{
   <copyField source="*" dest="text"/>
  }}}
- 
- 
- 
  === Similarity ===
  A `<similarity>` declaration can be used to specify the subclass of Similarity that you want Solr to use when dealing with your index.  If no Similarity class is specified, the Lucene !DefaultSimilarity is used.  Please see SolrPlugins for information on how to ensure that your own custom Similarity can be loaded into Solr.