You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/03 02:52:40 UTC

[Solr Wiki] Update of "TermVectorComponent" by GrantIngersoll

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "TermVectorComponent" page has been changed by GrantIngersoll.
http://wiki.apache.org/solr/TermVectorComponent?action=diff&rev1=12&rev2=13

--------------------------------------------------

  = Introduction =
- 
  <!> Solr 1.4 <!>
  
  The Term Vector Component (TVC) is a !SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field:
+ 
  {{{
  <field name="features" type="text" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>
  }}}
- 
  For each document, the TVC can return, the term vector, the term frequency, inverse document frequency, position and offset information.  As with most components, there are a number of options that are outlined in the samples below.
  
  = Sample Usage =
- 
  All examples are based on using the Solr example.
  
  == Enabling the TVC ==
- 
  === Changes required in solrconfig.xml ===
- 
  You need to enable the TermVectorComponent in your solr configuration:
  
  {{{
  <searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>
  }}}
- 
  A RequestHandler configuration using this component could look like this:
  
  {{{
  <requestHandler name="tvrh" class="org.apache.solr.handler.component.SearchHandler">
- 	<lst name="defaults">
+         <lst name="defaults">
- 		<bool name="tv">true</bool>
+                 <bool name="tv">true</bool>
- 	</lst>
+         </lst>
- 	<arr name="last-components">
+         <arr name="last-components">
- 		<str>tvComponent</str>
- 	</arr>
+                 <str>tvComponent</str>
+         </arr>
  </requestHandler>
  }}}
- 
  === HTTP Requests ===
- 
  {{{http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true}}}
  
  In the example, the component is associated with a request handler named tvrh, but you can associate it with any !RequestHandler.  To turn on the component for a request, add the {{{tv=true}}} parameter (or add it to your !RequestHandler defaults configuration).
  
- Example output:
- See TermVectorComponentExampleEnabled.
+ Example output: See TermVectorComponentExampleEnabled.
  
  == Options ==
- 
  {{{http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true}}}
  
   * tv.tf - Return document term frequency info per term in the document.
@@ -58, +49 @@

   * tv.tf_idf - Calculates tf*idf for each term.  Requires the parameters tv.tf and tv.df to be "true". This can be expensive. (not shown in example output)
  
  Alternatively, a shortcut for all options on is:
+ 
   * tv.all=true
  
  Example output: See TermVectorComponentExampleOptions.
  
  Schema requirements see: FieldOptionsByUseCase.
  
+ === Per Field Options ===
+ With https://issues.apache.org/jira/browse/SOLR-1556, it is now possible to specify per field options, similar to the way per field options work in faceting, as in
+ 
+  * f.fieldName.tv.tf - Turns on Term Frequency for the fieldName specified.
+  * Similar for all the other options above
+ 
+ If you do not specify per field options but still specify a field, it will assume the general options.
+ 
  == Other Options ==
- 
   * tv.fl - List of fields to get TV information from.  Optional.  If not specified, the fl parameter is used.
+   * As of https://issues.apache.org/jira/browse/SOLR-1556, If the field does not exist, an exception is thrown
   * tv.docIds - List of Lucene document ids (not the Solr Unique Key) to get term vectors for.
  
+ == Warnings ==
+ https://issues.apache.org/jira/browse/SOLR-1556
+ 
+ If a request field does not support the options specified, warnings will be returned indicating that the field does not support that option.  There are three types of warnings:
+ 
+  1. noTermVector - The field does not store term vectors
+  1. noPositions - The field does not store positions
+  1. noOffsets - The field does not store offsets
+ 
+ Each of these items is a List of Strings containing the field name that does not support the option specified.
+ 
  == SolrJ ==
- 
  Neither the SolrQuery class nor the QueryResponse class offer specific method calls to set TermVectorComponent parameters or get the "termVectors" output. However, there is a patch for it: [[https://issues.apache.org/jira/browse/SOLR-949|SOLR-949]].
  
  == History ==