You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2010/09/26 00:39:38 UTC

[jira] Created: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
-------------------------------------------------------------------------------------

                 Key: SOLR-2134
                 URL: https://issues.apache.org/jira/browse/SOLR-2134
             Project: Solr
          Issue Type: Improvement
          Components: Schema and Analysis
            Reporter: Ryan McKinley


With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915827#action_12915827 ] 

Yonik Seeley commented on SOLR-2134:
------------------------------------

bq. Hmmm is that going to be a perf hit (Double.compare) for the common case (no NaNs)? 

Agree.  We shouldn't need to suport NaNs  as a fieldcache value.  If we want that in the future, it can be a different comparator.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915132#action_12915132 ] 

Yonik Seeley commented on SOLR-2134:
------------------------------------

Heh, yeah - Double.MIN_VALUE is only the smallest positive number... definitely deceiving.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915792#action_12915792 ] 

Uwe Schindler commented on SOLR-2134:
-------------------------------------

I have not closely looked into it (because the Solr sorting stuff is out of my scope), but I have one comment about floats and doubles:
The problem with the approach of setting infinity as replacement value is, that NaN values are still undefined and may be ordered before/after these infinity values. But I think the problem is minor.
The same problem applies if you have infinity itsself or for long/double min/max values as field value, then the sorting is also undefined (the not set values should go after/before all real infinities).

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915828#action_12915828 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

Is it worth the added complexity and potential performance hit to fudge the behavior for the edge case?

As long as things don't crash with NaN, I'm not sure if worrying about NaN vs Infinity sorting is worth it.

I don't really care though.

bq. as it has a defined order 

I don't see anything in the javadocs -- are you just saying it is defined because it is the same as what your JVM will do?  Conceptually I guess NaN is closer to zero then Infinity, but the other way round seems just as likely.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915129#action_12915129 ] 

Uwe Schindler commented on SOLR-2134:
-------------------------------------

For floats you may not use MIN_VALUE and MAX_VALUE (which is defined different for float/doubles), its NEGATIVE_INFINITY and POSITIVE_INFINITY. Thats all :-)

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915809#action_12915809 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

{quote}
The problem with the approach of setting infinity as replacement value is, that NaN values are still undefined and may be ordered before/after these infinity values. But I think the problem is minor.
The same problem applies if you have infinity itsself or for long/double min/max values as field value, then the sorting is also undefined (the not set values should go after/before all real infinities).
{quote}

what behavior would you expect when a values in NaN or +- infinity?   It seems OK to have that be undefined (but towards the end)

Alternativly we could do something like:
{code:java}

    @Override
    public void copy(int slot, int doc) {
      if( checkMissing ) {
        if( cached.valid != null && cached.valid != null && !cached.valid.get(doc) ) {
          values[slot] = missingValue;
          return;
        }
        if( cached.values[doc] == Double.NaN ) {
          cached.values[doc] = missingValue; //???  perhaps check sign and go 2+- that value?  2 because INFINITY may be +- 1
        }
        else if( cached.values[doc] == Double.POSITIVE_INFINITY ) {
          cached.values[doc] = Double.POSITIVE_INFINITY - 1; //???  2 just 
        }
        else if( cached.values[doc] == Double.NEGATIVE_INFINITY ) {
          cached.values[doc] = Double.NEGATIVE_INFINITY + 1; //???
        }
      }
      values[slot] = cached.values[doc];
    }
{code}
I have not checked that adding anything to infinity is different...  it is CS, not the real world afterall



> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-2134:
--------------------------------

    Attachment: SOLR-2134-SortMissingLast.patch

Here is an updated patch that includes everything from LUCENE-2671

This also introduces NumericFieldCacheSource in solr so that the ValueSources use the new EntryCreator API with flags set to generate the valid bits.

I still need to get tests included for solr

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-2134:
--------------------------------

    Attachment: SOLR-2134-SortMissingLast.patch

An updated patch with tests that *almost* works good.  

The tests for int and long fields work great, for float/double fields, it is not behaving as I would expect.  Sort desc does not put the missing fields at the bottom.

I follow the same pattern that works for int/long:
{code:java}

      case FLOAT:
        if( sortMissingLast ) {
          missingValue = top ? Float.MIN_VALUE : Float.MAX_VALUE;
        }
        else if( sortMissingFirst ) {
          missingValue = top ? Float.MAX_VALUE : Float.MIN_VALUE;
        }
        return new SortField( new FloatValuesCreator( field.getName(), 
            FieldCache.NUMERIC_UTILS_FLOAT_PARSER, flags ), top).setMissingValue( missingValue );
{code}


Any ideas?






> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-2134:
--------------------------------

    Attachment: SOLR-2134-SortMissingLast.patch

Ahhh -- well that fixes things!

using top NEGATIVE_INFINITY / POSITIVE_INFINITY makes all the tests pass.

If someone has some time, I think this is ready to go

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915831#action_12915831 ] 

Uwe Schindler edited comment on SOLR-2134 at 9/28/10 12:59 PM:
---------------------------------------------------------------

NaN is always returning false when compared to anything. Its not JVM dependent it is mathematical correct and defined in IEEE-754 standard: [http://en.wikipedia.org/wiki/NaN], [http://forums.sun.com/thread.jspa?threadID=5419285]

The coolest thing is how you test for NaN (from JDK source code):

{code}
public static public boolean isNaN(double v) {
  return (v != v);
}
{code}

This really returns only true for NaN, because as I said NaN always returns inequality in any comparison. The problem with that in Lucene's/Solr's sorting is the fact that the PriorityQueue uses lessThan which would return always false, in any case, so the PriorityQueue gets mixed up. You can see that, because NaN values are mixed between the other values, dependent on the order when they were inserted.

So to remove the method call above use:

{code}
if (cached.values[doc] != cached.values[doc]) // test for NaN
{code}

Looks pervers but is correct *g*

Not related to that: One thing about your patches, some of them already committed: Can you please use the Lucene Coding Conventions (no extra space around if statements and the opening { at the end of a method declaration is in the same line). We have a Eclipse style file in wiki.

      was (Author: thetaphi):
    NaN is always returning false when compared to anything. Its not JVM dependent it is mathematical correct and defined in IEEE-754 standard: [http://en.wikipedia.org/wiki/NaN], [http://forums.sun.com/thread.jspa?threadID=5419285]

The coolest thing is how you test for NaN (from JDK source code):

{code}
public static public boolean isNaN(double v) {
  return (v != v);
}
{code}

This really returns only true for NaN, because as I said NaN always returns false in any comparison. The problem with that in Lucene's/Solr's sorting is the fact that the PriorityQueue uses lessThan which would return always false, in any case, so the PriorityQueue gets mixed up. You can see that, because NaN values are mixed between the other values, dependent on the order when they were inserted.

So to remove the method call above use:

{code}
if (cached.values[doc] != cached.values[doc]) // test for NaN
{code}

Looks pervers but is correct *g*

Not related to that: One thing about your patches, some of them already committed: Can you please use the Lucene Coding Conventions (no extra space around if statements and the opening { at the end of a method declaration is in the same line). We have a Eclipse style file in wiki.
  
> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915831#action_12915831 ] 

Uwe Schindler commented on SOLR-2134:
-------------------------------------

NaN is always returning false when compared to anything. Its not JVM dependent it is mathematical correct and defined in IEEE-754 standard: [http://en.wikipedia.org/wiki/NaN], [http://forums.sun.com/thread.jspa?threadID=5419285]

The coolest thing is how you test for NaN (from JDK source code):

{code}
public static public boolean isNaN(double v) {
  return (v != v);
}
{code}

This really returns only true for NaN, because as I said NaN always returns false in any comparison. The problem with that in Lucene's/Solr's sorting is the fact that the PriorityQueue uses lessThan which would return always false, in any case, so the PriorityQueue gets mixed up. You can see that, because NaN values are mixed between the other values, dependent on the order when they were inserted.

So to remove the method call above use:

{code}
if (cached.values[doc] != cached.values[doc]) // test for NaN
{code}

Looks pervers but is correct *g*

Not related to that: One thing about your patches, some of them already committed: Can you please use the Lucene Coding Conventions (no extra space around if statements and the opening { at the end of a method declaration is in the same line). We have a Eclipse style file in wiki.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916011#action_12916011 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

Committed to trunk in   #1002464, this should be back ported when LUCENE-2665 is ready for 3.x

This patch does not yet deprecate the Sortable* field types.   

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915780#action_12915780 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

Anyone get a chance to look at this?  I'd like to commit it soon.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915816#action_12915816 ] 

Uwe Schindler commented on SOLR-2134:
-------------------------------------

I don't expect adding/subtracting anything to infinity changes its value (at least the IEEE-754 specs say it should not change anything). NaN order is not undefined but behaves different than you expect (it means if you ever compare NaN with anything using <, >, == it will return always false). This will mix up the sorting, so defining it as missingValue is maybe a good idea. But as said before, you cannot compare with NaN, it will always return false, so use Double.isNan(cached.values[doc]) :-)

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915822#action_12915822 ] 

Uwe Schindler commented on SOLR-2134:
-------------------------------------

Maybe a good idea is to use [http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Double.html#compare(double, double)] to compare the doubles in the comparator as it has a defined order for all values including NaN!

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915089#action_12915089 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

this is the Tri* getSortField code:
{code:java}

  public SortField getSortField(SchemaField field, boolean top) {
    int flags = CachedArrayCreator.CACHE_VALUES_AND_BITS;
    Object missingValue = null;
    boolean sortMissingLast  = on( SORT_MISSING_LAST,  properties );
    boolean sortMissingFirst = on( SORT_MISSING_FIRST, properties );
    
    switch (type) {
      case INTEGER:
        if( sortMissingLast ) {
          missingValue = Integer.MAX_VALUE;
        }
        else if( sortMissingFirst ) {
          missingValue = Integer.MIN_VALUE;
        }
        return new SortField( new IntValuesCreator( field.getName(), 
            FieldCache.NUMERIC_UTILS_INT_PARSER, flags ), top).setMissingValue( missingValue );
      
      case FLOAT:
        if( sortMissingLast ) {
          missingValue = Float.MAX_VALUE;
        }
        else if( sortMissingFirst ) {
          missingValue = Float.MIN_VALUE;
        }
        return new SortField( new FloatValuesCreator( field.getName(), 
            FieldCache.NUMERIC_UTILS_FLOAT_PARSER, flags ), top).setMissingValue( missingValue );
      
      case DATE: // fallthrough
      case LONG:
        if( sortMissingLast ) {
          missingValue = Long.MAX_VALUE;
        }
        else if( sortMissingFirst ) {
          missingValue = Long.MIN_VALUE;
        }
        return new SortField( new LongValuesCreator( field.getName(), 
            FieldCache.NUMERIC_UTILS_LONG_PARSER, flags ), top).setMissingValue( missingValue );
        
      case DOUBLE:
        if( sortMissingLast ) {
          missingValue = Double.MAX_VALUE;
        }
        else if( sortMissingFirst ) {
          missingValue = Double.MIN_VALUE;
        }
        return new SortField( new DoubleValuesCreator( field.getName(), 
            FieldCache.NUMERIC_UTILS_DOUBLE_PARSER, flags ), top).setMissingValue( missingValue );
        
      default:
        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unknown type for trie field: " + field.name);
    }
  }
{code}

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915816#action_12915816 ] 

Uwe Schindler edited comment on SOLR-2134 at 9/28/10 12:30 PM:
---------------------------------------------------------------

I don't expect adding/subtracting anything to infinity changes its value (at least the IEEE-754 specs say it should not change anything). I would keep this as it is, I wanted to just note, that the undefined value may collide with a real value. So if you have e.g. Integer.MAX_VALUE in the slot but your missingValue is also Integer.MAX_VALUE, then sorting at this place is strange. But this affects all data types and was like that since ever?

The problem is NaN (as it is with function queries where the score is NaN):
NaN order is not undefined but behaves different than you expect (it means if you ever compare NaN with anything using <, >, == it will return always false). This will mix up the sorting, so defining it as missingValue is maybe a good idea. But as said before, you cannot compare with NaN, it will always return false, so use Double.isNan(cached.values[doc]) :-)

      was (Author: thetaphi):
    I don't expect adding/subtracting anything to infinity changes its value (at least the IEEE-754 specs say it should not change anything). NaN order is not undefined but behaves different than you expect (it means if you ever compare NaN with anything using <, >, == it will return always false). This will mix up the sorting, so defining it as missingValue is maybe a good idea. But as said before, you cannot compare with NaN, it will always return false, so use Double.isNan(cached.values[doc]) :-)
  
> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914995#action_12914995 ] 

Michael McCandless commented on SOLR-2134:
------------------------------------------

This looks great!

So it moves the "sort missing last" capability down into Lucene, it enables SortField & Comparators to use XXXValuesCreator, so that (I think) an app could make its own external source of values (ie not use the "uninversion" that FieldCache/ReaderCache uses).

If you pass null for the missing value then the bits are not loaded in the FieldCache right?  (And then the comparator behaves as it does today, ie treats these as 0, though probably we should advertise that the missing value behavior is undefined).

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915045#action_12915045 ] 

Ryan McKinley commented on SOLR-2134:
-------------------------------------

bq.  If you pass null for the missing value then the bits are not loaded in the FieldCache right?

Passing null as the missing value indicates that the Comparator should not even try looking for missing values.  Setting the missing value will turn on the 'CACHE_BITS' option
{code:java}
public SortField setMissingValue( Object v )
  {
    missingValue = v;
    if( missingValue != null ) {
      if( this.creator == null ) {
        throw new IllegalArgumentException( "Missing value only works for sort fields with a CachedArray" );
      }

      // Set the flag to get bits 
      creator.setFlag( CachedArrayCreator.OPTION_CACHE_BITS );
    }
    return this;
  }
{code}

But the Comparators only actually set the bit if the bits are in the field cache:
{code:java}
    @Override
    public void copy(int slot, int doc) {
      values[slot] = ( checkMissing && cached.valid != null && !cached.valid.get(doc) )
        ? missingValue : cached.values[doc];
    }
{code}

I will make a LUCENE jira issue for adding 'sort missing first/last' to SortField/XxxxComparator -- and use this issue for the solr integration.



> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915826#action_12915826 ] 

Michael McCandless commented on SOLR-2134:
------------------------------------------

Hmmm is that going to be a perf hit (Double.compare) for the common case (no NaNs)?

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-2134:
--------------------------------

    Attachment: SOLR-2134-SortMissingLast.patch

This patch adds 'missingValue' to the lucene SortField and then passes that on to the FieldComparator.  Setting the 'missingValue' to MIN_VALUE or MAX_VALUE lets you either sort the missing fields first or last.

Since the majority of the work is actually in lucene, it may make more sense to have this as a LUCENE issue.

The solr side still needs some tests, but wanted to get this out for folks to see.

> Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-2134
>                 URL: https://issues.apache.org/jira/browse/SOLR-2134
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>         Attachments: SOLR-2134-SortMissingLast.patch
>
>
> With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not.  This is enough to support sortMissingLast=true with Trie* fields.  Then we can get rid of the Sortable* fields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org