You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Woody Anderson (JIRA)" <ji...@apache.org> on 2010/07/18 03:44:50 UTC

[jira] Created: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
--------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-2544
                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Other
    Affects Versions: 3.0.2
            Reporter: Woody Anderson
            Priority: Minor
             Fix For: 4.0, 3.0.2


In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.

most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
I have the patch for this, or i will after i isolate it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Woody Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889648#action_12889648 ] 

Woody Anderson commented on LUCENE-2544:
----------------------------------------

which is two Field objects. which is what i meant, i probably should have been more precise.

at any rate, it basically adds a numeric equivalent of the DateTools.Resolution.

The patch is simple enough for me to always be able to patch releases etc. if really don't like it, but i'm not sure what's so confusing about having one extra optional parameter to index a timestamp as seconds but store as milliseconds, and to do that without adding new Field(..);

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Woody Anderson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woody Anderson updated LUCENE-2544:
-----------------------------------

    Attachment: LUCENE-2544.patch

decided to leave numerictokenstream untouched and simply modify internals of NumericField. makes the diff very small.

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 3.0.2, 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2544:
----------------------------------

    Fix Version/s:     (was: 3.0.2)

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913084#action_12913084 ] 

Uwe Schindler commented on LUCENE-2544:
---------------------------------------

It is two field instances, but results only in one field in the index. Stored fields and indexed fields are handled separate by the indexer, so there is nothing different between a combined store/index and two separate Field instances (same field name!) with one is stored the other is indexed. If you want to store something different than you indexed, this is the way to go:
{code}
doc.add(new NumericField(name, Field.Store.NO, true).setIntValue(xxx/divisor));
doc.add(new NumericField(name, Field.Store.YES, false).setIntValue(xxx));
{code}

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889580#action_12889580 ] 

Uwe Schindler commented on LUCENE-2544:
---------------------------------------

I don't think this patch is a good idea it creates much confusion. You don't need to create *two* fields in the index to achieve the same. You only need two NumericField instances for the same field name, one indexed (with divisor applied) and one stored (without a divisor applied).

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2544) Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.

Posted by "Woody Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913133#action_12913133 ] 

Woody Anderson commented on LUCENE-2544:
----------------------------------------

I really do understand the difference between Field the instance object, and a field in the index. I use cap F for the java class and lowercase for the index.

You can accomplish this with two NFs, but you can also implement NumericField with a series of "new Field()" using the same name as well. But, you don't do this, b/c it's much more convenient to have it bundled up in a nice concise form.

There is (admittedly, from my perspective) one issue with this kind of feature. It's that the divisor must be known and kept track of by the lucene user during query parsing and during term-enum inspection if they are doing that sort of thing. The current QueryParser uses a map of field to DateTools.Resolution, which this mechanism would effectively mimic. Though it would produce NumericField formatted tokens in the index rather than date strings, which can often be an advantage for ranges etc. The fact that it also provides numeric resolution for any numeric field is a bonus, but it would involve some change to the QueryParser to correctly handle this, as it currently does not handle querying any field indexed as NumericField. Both this edit and DateTools have the same drawbacks for term-enum inspection (facets etc), so clearly the responsibility for handling that lies with the user of lucene already. I have a schema at parse/inspect time, so i had overlooked this issue.

At any rate, I still don't get what you consider confusing about this functionality. DateTools.Res shows clear use case, modern NumericField features for fast ranges etc. is often a clear improvement over string date tokens at any resolution. And wrapping it up into the single existing class is just easier to use than requiring multiple NF objects be added to the document. Unless you advocated that NF be implemented as a static utility class that injected multiple Field objects into the Document, i'm not sure why this consolidation goes against the grain.

> Add 'divisor' to NumericField, allows for easy storage of full precision data, but indexing *starting* at lower precision.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2544
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2544
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>    Affects Versions: 3.0.2
>            Reporter: Woody Anderson
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2544.patch
>
>
> In some cases, we want to index a timestamp or some other high precision numeric at a much lower precision, but we still want to store the full precision data.
> Rather than have to do this with two Field objects in the Document, it'd be easier to provide NumericField with a divisor as well as prevision step. The divisor would apply before beginning the trie logic.
> most often, this is a divide by 1, but that will happen only during the constructor or setXXXValue() in NumericTokenStream.
> I have the patch for this, or i will after i isolate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org