You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/03 02:52:40 UTC
[Solr Wiki] Update of "TermVectorComponent" by GrantIngersoll
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "TermVectorComponent" page has been changed by GrantIngersoll.
http://wiki.apache.org/solr/TermVectorComponent?action=diff&rev1=12&rev2=13
--------------------------------------------------
= Introduction =
-
<!> Solr 1.4 <!>
The Term Vector Component (TVC) is a !SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field:
+
{{{
<field name="features" type="text" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>
}}}
-
For each document, the TVC can return, the term vector, the term frequency, inverse document frequency, position and offset information. As with most components, there are a number of options that are outlined in the samples below.
= Sample Usage =
-
All examples are based on using the Solr example.
== Enabling the TVC ==
-
=== Changes required in solrconfig.xml ===
-
You need to enable the TermVectorComponent in your solr configuration:
{{{
<searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>
}}}
-
A RequestHandler configuration using this component could look like this:
{{{
<requestHandler name="tvrh" class="org.apache.solr.handler.component.SearchHandler">
- <lst name="defaults">
+ <lst name="defaults">
- <bool name="tv">true</bool>
+ <bool name="tv">true</bool>
- </lst>
+ </lst>
- <arr name="last-components">
+ <arr name="last-components">
- <str>tvComponent</str>
- </arr>
+ <str>tvComponent</str>
+ </arr>
</requestHandler>
}}}
-
=== HTTP Requests ===
-
{{{http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true}}}
In the example, the component is associated with a request handler named tvrh, but you can associate it with any !RequestHandler. To turn on the component for a request, add the {{{tv=true}}} parameter (or add it to your !RequestHandler defaults configuration).
- Example output:
- See TermVectorComponentExampleEnabled.
+ Example output: See TermVectorComponentExampleEnabled.
== Options ==
-
{{{http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true}}}
* tv.tf - Return document term frequency info per term in the document.
@@ -58, +49 @@
* tv.tf_idf - Calculates tf*idf for each term. Requires the parameters tv.tf and tv.df to be "true". This can be expensive. (not shown in example output)
Alternatively, a shortcut for all options on is:
+
* tv.all=true
Example output: See TermVectorComponentExampleOptions.
Schema requirements see: FieldOptionsByUseCase.
+ === Per Field Options ===
+ With https://issues.apache.org/jira/browse/SOLR-1556, it is now possible to specify per field options, similar to the way per field options work in faceting, as in
+
+ * f.fieldName.tv.tf - Turns on Term Frequency for the fieldName specified.
+ * Similar for all the other options above
+
+ If you do not specify per field options but still specify a field, it will assume the general options.
+
== Other Options ==
-
* tv.fl - List of fields to get TV information from. Optional. If not specified, the fl parameter is used.
+ * As of https://issues.apache.org/jira/browse/SOLR-1556, If the field does not exist, an exception is thrown
* tv.docIds - List of Lucene document ids (not the Solr Unique Key) to get term vectors for.
+ == Warnings ==
+ https://issues.apache.org/jira/browse/SOLR-1556
+
+ If a request field does not support the options specified, warnings will be returned indicating that the field does not support that option. There are three types of warnings:
+
+ 1. noTermVector - The field does not store term vectors
+ 1. noPositions - The field does not store positions
+ 1. noOffsets - The field does not store offsets
+
+ Each of these items is a List of Strings containing the field name that does not support the option specified.
+
== SolrJ ==
-
Neither the SolrQuery class nor the QueryResponse class offer specific method calls to set TermVectorComponent parameters or get the "termVectors" output. However, there is a patch for it: [[https://issues.apache.org/jira/browse/SOLR-949|SOLR-949]].
== History ==