You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by us...@apache.org on 2012/12/08 13:11:17 UTC

svn commit: r1418653 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/core/ lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java lucene/core/src/java/org/apache/lucene/search/doc-files/

Author: uschindler
Date: Sat Dec  8 12:11:16 2012
New Revision: 1418653

URL: http://svn.apache.org/viewvc?rev=1418653&view=rev
Log:
Merged revision(s) 1418652 from lucene/dev/trunk:
LUCENE-4592: Improve Javadocs of NumericRangeQuery

Added:
    lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/doc-files/
      - copied from r1418652, lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/doc-files/
Modified:
    lucene/dev/branches/branch_4x/   (props changed)
    lucene/dev/branches/branch_4x/lucene/   (props changed)
    lucene/dev/branches/branch_4x/lucene/core/   (props changed)
    lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java

Modified: lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java?rev=1418653&r1=1418652&r2=1418653&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java (original)
+++ lucene/dev/branches/branch_4x/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java Sat Dec  8 12:11:16 2012
@@ -73,14 +73,9 @@ import org.apache.lucene.index.Term; // 
  * details.
  *
  * <p>This query defaults to {@linkplain
- * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for
- * 32 bit (int/float) ranges with precisionStep &le;8 and 64
- * bit (long/double) ranges with precisionStep &le;6.
- * Otherwise it uses {@linkplain
- * MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the
- * number of terms is likely to be high.  With precision
- * steps of &le;4, this query can be run with one of the
- * BooleanQuery rewrite methods without changing
+ * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}.
+ * With precision steps of &le;4, this query can be run with
+ * one of the BooleanQuery rewrite methods without changing
  * BooleanQuery's default max clause count.
  *
  * <br><h3>How it works</h3>
@@ -117,17 +112,29 @@ import org.apache.lucene.index.Term; // 
  *
  * <a name="precisionStepDesc"><h3>Precision Step</h3>
  * <p>You can choose any <code>precisionStep</code> when encoding values.
- * Lower step values mean more precisions and so more terms in index (and index gets larger).
- * On the other hand, the maximum number of terms to match reduces, which optimized query speed.
- * The formula to calculate the maximum term count is:
- * <pre>
- *  n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
- * </pre>
- * <p><em>(this formula is only correct, when <code>bitsPerValue/precisionStep</code> is an integer;
- * in other cases, the value must be rounded up and the last summand must contain the modulo of the division as
- * precision step)</em>.
- * For longs stored using a precision step of 4, <code>n = 15*15*2 + 15 = 465</code>, and for a precision
- * step of 2, <code>n = 31*3*2 + 3 = 189</code>. But the faster search speed is reduced by more seeking
+ * Lower step values mean more precisions and so more terms in index (and index gets larger). The number
+ * of indexed terms per value is (those are generated by {@link NumericTokenStream}):
+ * <p style="font-family:serif">
+ * &nbsp;&nbsp;indexedTermsPerValue = <b>ceil</b><big>(</big>bitsPerValue / precisionStep<big>)</big>
+ * </p>
+ * As the lower precision terms are shared by many values, the additional terms only
+ * slightly grow the term dictionary (approx. 7% for <code>precisionStep=4</code>), but have a larger
+ * impact on the postings (the postings file will have  more entries, as every document is linked to
+ * <code>indexedTermsPerValue</code> terms instead of one). The formula to estimate the growth
+ * of the term dictionary in comparison to one term per value:
+ * <p>
+ * <!-- the formula in the alt attribute was transformed from latex to PNG with http://1.618034.com/latex.php (with 110 dpi): -->
+ * &nbsp;&nbsp;<img src="doc-files/nrq-formula-1.png" alt="\mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}}" />
+ * </p>
+ * <p>On the other hand, if the <code>precisionStep</code> is smaller, the maximum number of terms to match reduces,
+ * which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while
+ * executing the query is:
+ * <p>
+ * <!-- the formula in the alt attribute was transformed from latex to PNG with http://1.618034.com/latex.php (with 110 dpi): -->
+ * &nbsp;&nbsp;<img src="doc-files/nrq-formula-2.png" alt="\mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right)" />
+ * </p>
+ * <p>For longs stored using a precision step of 4, <code>maxQueryTerms = 15*15*2 + 15 = 465</code>, and for a precision
+ * step of 2, <code>maxQueryTerms = 31*3*2 + 3 = 189</code>. But the faster search speed is reduced by more seeking
  * in the term enum of the index. Because of this, the ideal <code>precisionStep</code> value can only
  * be found out by testing. <b>Important:</b> You can index with a lower precision step value and test search speed
  * using a multiple of the original step value.</p>
@@ -143,7 +150,7 @@ import org.apache.lucene.index.Term; // 
  *  per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used
  *  to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as
  *  <code>precisionStep</code>). Using {@link IntField},
- * {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting
+ *  {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting
  *  is ideal, because building the field cache is much faster than with text-only numbers.
  *  These fields have one term per value and therefore also work with term enumeration for building distinct lists
  *  (e.g. facets / preselected values to search for).