You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2008/12/19 15:33:44 UTC

[jira] Created: (LUCENE-1496) Move solr NumberUtils to lucene

Move solr NumberUtils to lucene
-------------------------------

                 Key: LUCENE-1496
                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
             Project: Lucene - Java
          Issue Type: Task
            Reporter: Ryan McKinley
            Priority: Trivial
             Fix For: 2.9


solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.

This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718473#action_12718473 ] 

Michael McCandless commented on LUCENE-1496:
--------------------------------------------

Uwe, with trie now handling 32 bit values, and coming into core, does Lucene need Solr's NumberUtils?  Are there compelling things that Solr's NumberUtils do over TrieUtil?

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1496.
----------------------------------------

    Resolution: Won't Fix

OK, I'm resolving as "won't fix".  It sounds like Lucene only needs TrieUtils...

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659445#action_12659445 ] 

Uwe Schindler commented on LUCENE-1496:
---------------------------------------

I am thinking about extending TrieUtils and TrieRangeQuery for 32bit values (ints and floats). Doing this, the other methods in NumberUtils would be obsolete, too. This was just a suggestion, maybe we should talk a little bit more about this.

On my first look through the code, I had not seen, that NumberUtils also supports doubles like TrieUtils, the only difference is the use of doubleToLongBits() vs. doubleToRawLongBits(). I am not sure what is better :-(, does anybody know more about this? Im my opinion the version with/without raw also normalizes doubles, so NaN compares with ==, anything other?

To my changes in TrieUtils for support of 32bit: I am currently not sure how to do this elegant. Esp. the auto detection of trie encoding is not so happy on changing this :-( As 2.9 is not yet released, I have time to change the classes and signatures without worry about deprecation and format changes. So a good point to unify TrieUtils and NumberUtils. Maybe TrieRangeQuery will make it into the core, when flexible indexing is available.

So my questions: Is anybody interested in TrieUtils also support 32bit? Why not unify NumberUtils and TrieUtils? Any ideas?

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659319#action_12659319 ] 

Uwe Schindler commented on LUCENE-1496:
---------------------------------------

I looked into the code of NumberUtils:

The encoding is very similar to the one of TrieUtils (used in TrieRangeQuery, see LUCENE-1470, http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/trie/TrieUtils.html). The only difference between TrieUtils and NumberUtils is the more compact encoding in NumberUtils (because in TrieUtils.VARIANT_8BIT uses one character per byte, NumberUtils uses 14 bits per character). TrieUtils works also correct with String.compareTo() (it was the intention behind TrieUtils).

In my opinion, TrieUtils has some more advantages:
- Doubles are encoded in a correctly sortable way (even Double.XXX_INFINITY!), using the IEEE binary representation of doubles with some bit alignments.
- Direct support for Dates and longs
- Builtin comparator for the new SortField constructor (LUCENE-1478)  and a nice SortField factory. This maps all encoded values to a FieldCache with long values (even for dates or doubles because there is no difference, longs have the fastest encoding/decoding speed - for sorting, the real values are not interesting).

The only problem is, that indexes, encoded with the old NumberUtils are not readable by TrieUtils. But if we include such things into Lucene, we should not duplicate code and create again new encodings.

For the more compact encoding, TrieUtils could be extended, to also support a "14bit" Trie variant (which would not work for real trie encoding), but may be used for simply store longs very compact. On the other hand, if somebody uses NumberUtils, he may be also interested in TrieRangeQuery, so he should use TrieUtils.VARIANT_8BIT.

So I think, we should perhaps leave NumberUtils at solr and use TrieUtils in Lucene. LocalLucene should then also use TrieUtils. And solr may in future switch to Trie encoding with the next major version, too.

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683149#action_12683149 ] 

Michael McCandless commented on LUCENE-1496:
--------------------------------------------

If we move trie/* into core, what do we need/want to fold in from Solr's NumberUtils?

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659355#action_12659355 ] 

Ryan McKinley commented on LUCENE-1496:
---------------------------------------

should the number functions from TrieUtils be moved to a lucene NumberUtils?

API wise, if i were looking for ways to encode numbers, i doubt i would look at "TrieUtils.java"

what about the non-long/double functions in NumberUtils?

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1496) Move solr NumberUtils to lucene

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718476#action_12718476 ] 

Uwe Schindler commented on LUCENE-1496:
---------------------------------------

I think we can close this now. I originally wanted to make this issue dependent on the move-to-core issue and close it together with trie.

There is now a private copy of number utils inside LocalLucene, but this should be removed soonly and replaced by TrieUtils.

But yonik said: NumberUtils is also only for compatibility reasons in solr. The binary format used by NumberUtils is not very index-friendly (because of using the full UTF-16 range and so it has the UTF-8 decoding overhead), so it should not be used for new deleopments. I suggest to use TrieUtils (or NumberUtils to do the same). TrieUtils only uses 7 bits per char and so don't touch the UTF-8 encoding (it is simply ASCII-only).

> Move solr NumberUtils to lucene
> -------------------------------
>
>                 Key: LUCENE-1496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1496
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Ryan McKinley
>            Priority: Trivial
>             Fix For: 2.9
>
>
> solr includes a NumberUtils class with some general utilities for dealing with tokens and numbers.
> This should be in lucene rather then solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org