You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/05 23:18:33 UTC

[Solr Wiki] Update of "SchemaDesign" by GrantIngersoll

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/SchemaDesign

------------------------------------------------------------------------------
  = Sorting =
  There are two ways of sorting available in Solr 1.4: Lucene's sorting feature and function queries.
  == Lucene Sorting ==
- The Solr sort parameter uses the Lucene sorting tool. This creates an array containing an entry for every document in the index. Sorting is then done against this array. This array is cached across requests and so repeated sorts are fast.  If the field type is 'integer' the array contains only that value and thus is 4 bytes * the number of documents. If the field type is anything else, this integer array is created and then a separate array is also created with that field's data per entry. Sorting is also slower if the type is not an 'integer'.
+ The Solr sort parameter uses the Lucene sorting tool. This creates an array containing an entry for every document in the index using the !FieldCache. Sorting is then done against this array. This array is cached across requests using the !IndexReader and so repeated sorts are fast.  If the field type is 'integer' the array contains only that value and thus is 4 bytes * the number of documents. If the field type is anything else, this integer array is created and then a separate array is also created with that field's data per entry. Sorting is also slower if the type is not an 'integer'.
  However, range checks do not work on an 'integer' field. If you want range checks and fast sorting, you can create a pair of fields, one of each type, with a copyField directive:
  {{{
   <field name="popularity" type="sint" indexed="true" stored="true" multiValued="false"/>
@@ -17, +17 @@

   <copyField source="popularity" dest="popularitySort"/>
  }}}
  Note that since multiValued=false is the default for these types, attempting to store a value to 'popularitySort' will cause an indexing error, since it also always receives a value from 'popularity'. Also there is no reason to store both fields, and so 'popularitySort' is index-only.
+ 
+ === A Note on "sortable" FieldTypes ===
+ Sortable !FieldTypes like sint, sdouble are a bit of a misnomer.  They are not needed for Sorting in the sense described above, but are needed when doing !RangeQuery queries.  Sortables, in fact, refer to the notion of making the number sort correctly lexicographically as Strings.  That is, if this is not done, the numbers 1..10 sort lexicographically as 1,10, 2, 3...  Using an sint, however remedies this.  If, however, you don't need to do !RangeQuery queries and only need to sort on the field, then just use an int or double or the equivalent appropriate class.  You will save yourself time and memory.
+ 
  == Function Query Sorting ==
  Add this clause to your query string to sort the results using 'myIndexedField'. Do not use the 'sort=field+asc' parameter. See [FunctionQuery] for more.
  {{{
@@ -30, +34 @@

  == Phonemes ==
  Programmers are perfect spellers and expect the same of their users. A ''phoneme'' represents (roughly) the sound of one syllable. Phoneme-based searching can give users a better search experience. To support misspelled search words phoneme filters cause the index to store phoneme-base representations of the text instead of the input. This only finds misspellings which sound like the original word.
  
- To create a phoneme-based field, you need a text filter stack that does not include stemming or stopwords, and add the  solr.PhoneticFilterFactory (see [AnalyzersTokenizersTokenFilters]) with one of the available encoders. This must be in both the indexing and query stack. Of the several available the "Double Metaphone" filter is the most popular and does well with non-English text. There are as yet no language-specific phoneme encoders.
+ To create a phoneme-based field, you need a text filter stack that does not include stemming or stopwords, and add the  solr.!PhoneticFilterFactory (see AnalyzersTokenizersTokenFilters) with one of the available encoders. This must be in both the indexing and query stack. Of the several available the "Double Metaphone" filter is the most popular and does well with non-English text. There are as yet no language-specific phoneme encoders.
  
- For another take on assisting spelling, see [SpellCheckComponent].
+ For another take on assisting spelling, see SpellCheckComponent.
  == Unicode processing ==
  Searching text in different languages is very difficult. The Latin1Accent filters downgrade all European "special characters" down to their US Ascii equivalents: the French spelling ''protégé'' becomes the English spelling ''protege''. 
  In Solr-1.3, use this in the filter stack of your "text" field type: