You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ning Li (JIRA)" <ji...@apache.org> on 2009/02/17 20:16:59 UTC

[jira] Created: (LUCENE-1541) Trie range - make trie range indexing more flexible

Trie range - make trie range indexing more flexible
---------------------------------------------------

                 Key: LUCENE-1541
                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
            Reporter: Ning Li
            Priority: Minor


In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).

We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1541:
----------------------------------

        Lucene Fields: [New, Patch Available]  (was: [New])
    Affects Version/s: 2.9
        Fix Version/s: 2.9

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Ning Li (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678045#action_12678045 ] 

Ning Li commented on LUCENE-1541:
---------------------------------

An index size comparison will be great.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675695#action_12675695 ] 

Uwe Schindler commented on LUCENE-1541:
---------------------------------------

After thinking one night longer about the whole issue, I suspect, that non-equidistant precision steps are really needed:

Ning's comment is correct about the number of terms. But if you only index long values from, e.g., 0L to 10000L, you create a lot of terms for shift values 0 to 14 (because the terms are in this range). For shift values 15 to 63, the term is always the same constant term. The index' TermEnum so contains not many additional values (because its only *one* term for *all* documents), only additional TermDocs entries are created. It's the same like adding one "flag" term to all documents. This does not use much additional space in index. When you query a range, these terms are never used, but they do not hurt.

The additional space for the trie terms is generated by higher precision (lower shift) values. If you index with precision step 4 or 2 instead of a precision step of 8, you create a lot of *different* terms for the lower shift values. The constant terms in the higher shifts are still always the same and does not consume much space.

I will create a small comparison on index size for long values without higher bits, but I suspect, that index size without lower precision terms reduces space significant. If this is the case, I do not think the additional complexity of the API is needed for this low impact. If somebody really wants to optimize index size so much, he can create a optimized fork of TrieRange in his project that indexes with non-equidistant precision steps. On the other hand, I would suggest to use ints/floats instead of longs/doubles, if only lower precision is needed. In this case, less terms will be created. For floats in comparision to doubles, the effect will be bigger (because even small doubles use a lot of hgher precision bits), a non-equidistant precision step will not help very much here.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675695#action_12675695 ] 

thetaphi edited comment on LUCENE-1541 at 2/22/09 11:21 AM:
-----------------------------------------------------------------

After thinking one night longer about the whole issue, I doubt, that non-equidistant precision steps are really needed:

Ning's comment is correct about the number of terms. But if you only index long values from, e.g., 0L to 10000L, you create a lot of terms for shift values 0 to 14 (because the terms are in this range). For shift values 15 to 63, the term is always the same constant term. The index' TermEnum so contains not many additional values (because its only *one* term for *all* documents), only additional TermDocs entries are created. It's the same like adding one "flag" term to all documents. This does not use much additional space in index. When you query a range, these terms are never used, but they do not hurt.

The additional space for the trie terms is generated by higher precision (lower shift) values. If you index with precision step 4 or 2 instead of a precision step of 8, you create a lot of *different* terms for the lower shift values. The constant terms in the higher shifts are still always the same and does not consume much space.

I will create a small comparison on index size for long values without higher bits, but I doubt, that index size without lower precision terms reduces space significant. If this is the case, I do not think the additional complexity of the API is needed for this low impact. If somebody really wants to optimize index size so much, he can create a optimized fork of TrieRange in his project that indexes with non-equidistant precision steps. On the other hand, I would suggest to use ints/floats instead of longs/doubles, if only lower precision is needed. In this case, less terms will be created. For floats in comparision to doubles, the effect will be bigger (because even small doubles use a lot of hgher precision bits), a non-equidistant precision step will not help very much here.

      was (Author: thetaphi):
    After thinking one night longer about the whole issue, I suspect, that non-equidistant precision steps are really needed:

Ning's comment is correct about the number of terms. But if you only index long values from, e.g., 0L to 10000L, you create a lot of terms for shift values 0 to 14 (because the terms are in this range). For shift values 15 to 63, the term is always the same constant term. The index' TermEnum so contains not many additional values (because its only *one* term for *all* documents), only additional TermDocs entries are created. It's the same like adding one "flag" term to all documents. This does not use much additional space in index. When you query a range, these terms are never used, but they do not hurt.

The additional space for the trie terms is generated by higher precision (lower shift) values. If you index with precision step 4 or 2 instead of a precision step of 8, you create a lot of *different* terms for the lower shift values. The constant terms in the higher shifts are still always the same and does not consume much space.

I will create a small comparison on index size for long values without higher bits, but I suspect, that index size without lower precision terms reduces space significant. If this is the case, I do not think the additional complexity of the API is needed for this low impact. If somebody really wants to optimize index size so much, he can create a optimized fork of TrieRange in his project that indexes with non-equidistant precision steps. On the other hand, I would suggest to use ints/floats instead of longs/doubles, if only lower precision is needed. In this case, less terms will be created. For floats in comparision to doubles, the effect will be bigger (because even small doubles use a lot of hgher precision bits), a non-equidistant precision step will not help very much here.
  
> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719769#action_12719769 ] 

Uwe Schindler commented on LUCENE-1541:
---------------------------------------

I see no real use in it, it does not affect query performance, only index size. Maybe we should move it to 3.1 until I have some time, but the Payload thing is more interesting and maybe this can be combined.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1541:
---------------------------------------

    Fix Version/s:     (was: 2.9)
                   3.1

OK, moving out to 3.1.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1541:
----------------------------------

    Attachment: LUCENE-1541.patch

A first patch, completely untested:
 - Javadocs need to be updated
 - Tests with non-equidistant precision steps must be added
 - Warning: Method signatures of RangeBuilders changed (order and contents, but not types!!!)
 - Maybe additional shortcuts in RangeFilters needed (the expert ctor now takes field[] and precisionSteps[])

Does this look like an API, that may work for you? Currently I am not so happy with the additional loop that determines the length of the trie array (in trieCodeLong/Int) and the additional array allocations needed.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>         Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Ning Li (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675390#action_12675390 ] 

Ning Li commented on LUCENE-1541:
---------------------------------

When one precision step is given, it is converted to the representation. Then no array creation is necessary. But something like TrieUtils.FieldConfiguration would be better. Besides the field name and the precision steps, either it should also contain a type (long/int) or a subclass is created for each type. It can be used both at indexing time and at query time.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719766#action_12719766 ] 

Michael McCandless commented on LUCENE-1541:
--------------------------------------------

Uwe, what's the plan on this issue...?  Should it wait until 3.1?

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Ning Li (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675212#action_12675212 ] 

Ning Li commented on LUCENE-1541:
---------------------------------

If you are *really* concerned with the additional loop and the additional array allocations, a long can be used to represent the precision steps. For example, precision steps 2-2-2-2-8-8-8-8-8-16 are represented as 0x80008080808080aa. Then bitCount, shift and numberOfTrailingZeros can be used to determine the length of the trie array and the individual precision steps. Hmm, we still have to support Java 1.4?

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-1541:
-------------------------------------

    Assignee: Uwe Schindler

I prepare a patch.

Something to note: Choosing the correct precision step may be very difficult. E.g. for floats and doubles, you must exactly know, how the ieee-754 bitmap looks like to be effective; this is why I recommend for beginners to use a equidistant precision step. For normal integer values or fixed point values (like prices in cents as long), it is more meaningful. For doubles/floats, you must know, how many bits of the integer representation represent sign (1, thats simple), exponent and mantissa. But for experts, it may be ok. I will change only the expert functions to accept a array of precionStep values.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675269#action_12675269 ] 

Uwe Schindler commented on LUCENE-1541:
---------------------------------------

That would be a possibility, but not understandable for users :-). My problem was, that for each invocation of the simple trieCodeLong/Int() methods (taking only one precision step), the array is created in the inner expert method call. Maybe I revert to use the old code for this specific case here. When indexing documents, you need to call this method very often.

But Java5 uses the same technique for varargs and also allocates the array on each call, so I do not think this is a real problem.

Maybe another solution for all this (and would make API simplier and more extensible without changing method parameters) would be to use a helper class TrieUtils.FieldConfiguration that contains the precision steps and field names (and maybe more in future). A user could instantiate one of this helpers for each field and reuse it, passing it to the relevant methods (accepting field names and precision steps). These methods/ctors would then only exist once (not overloaded). 

I wanted also talk with Yonik about it, what he thinks.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1541) Trie range - make trie range indexing more flexible

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1541:
----------------------------------

    Attachment: LUCENE-1541.patch

Updated patch (removed the API change in RangeBuilder), as not related to this issue. This patch also restores the original trieCodeLong/Int, that uses the equidistant precision step, so the indexing is faster, because no extra loop needed here.
Still missing are tests and javadocs, this still a early version.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small precision step (say 2), the number of terms for a range is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions. An expert can use this option to keep the number of terms for a range small and at the same time index a value in a small number of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org