You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Gilad Barkai (JIRA)" <ji...@apache.org> on 2011/08/22 13:33:29 UTC

[jira] [Created] (LUCENE-3390) Incorrect sort by Numeric (double) values for documents missing the sorting field

Incorrect sort by Numeric (double) values for documents missing the sorting field
---------------------------------------------------------------------------------

                 Key: LUCENE-3390
                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/search
    Affects Versions: 3.3
            Reporter: Gilad Barkai
            Priority: Minor


While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

Example code:
public static void main(String[] args) throws Exception {
	RAMDirectory d = new RAMDirectory();
	IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
	
	// 1st doc, value 3.5d
	Document doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(3.5d));
	w.addDocument(doc);
	
	// 2nd doc, value of -10d
	doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(-10d));
	w.addDocument(doc);
	
	// 3rd doc, no value at all
	w.addDocument(new Document());
	w.close();

	IndexSearcher s = new IndexSearcher(d);
	Sort sort = new Sort(new SortField("f", SortField.DOUBLE, true));
	TopDocs td = s.search(new MatchAllDocsQuery(), 10, sort);
	for (ScoreDoc sd : td.scoreDocs) {
		System.out.println(sd.doc + ": " + s.doc(sd.doc).get("f"));
	}
	s.close();
	d.close();
}
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109611#comment-13109611 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

Looks great!

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4, 3.5
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095409#comment-13095409 ] 

Doron Cohen commented on LUCENE-3390:
-------------------------------------

I think it may be useful to solve this also in 3x - without the cached-array-creators of the trunk, but with similar concept - i.e. an additional cache "type" will cache the docs missing values for certain field, and will allow to use the default value assigned by apps calling setMissingValue() as in trunk. Gilad and I looked at this, will post a patch shortly for review...

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109402#comment-13109402 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

I would love to take this even further, and have trunk's FC implement missing values the same way 3.x does (ie, separate FC method to getUnvaluedDocs, rather than bundling this bitset w/ the computation of the values array).  But we should do that separately.

This is actually a serious bug; maybe we should release 3.4.1 soon (this would also fix the Maven packaging problem in 3.4.0).

Why did we need to narrow the return value from FC.getUnvaluedDocs to FixedBitSet?

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment:     (was: LUCENE-3390-inverted.patch)

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4, 3.5
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-fix-like-trunk.patch

Here a quick fix:
- Uses same code like trunk inside the comparators
- Uses FixedBitSet instead of OpenBitSet

Unfortunately this breaks backwards compatibility as explained before.

This patch needs some cleanup (remove code duplication and maybe move the missingValue code to a numeric base class like in trunk). This is also broken in current code, it has missing value code in FieldComparator base class, which should only be in a NumericFieldComparator abstract base class (missing in 3.x).

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097168#comment-13097168 ] 

Robert Muir commented on LUCENE-3390:
-------------------------------------

+1 to revisit how this was done in trunk.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097164#comment-13097164 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

Also, can we use FastBitSet, not OpenBitSet, here?

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109404#comment-13109404 ] 

Uwe Schindler commented on LUCENE-3390:
---------------------------------------

bq. Why did we need to narrow the return value from FC.getUnvaluedDocs to FixedBitSet?

We have no Bits interface in 3.x. And DocIdSet is not random access. Maybe we should backport the Bits interface?

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109434#comment-13109434 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

Looks great Uwe!  I think we can assert that the cardinality is <= numDocs, and then short-circuit the common == numDocs (all docs have values) case like you are.

I love how 3.x handles the unvalued bits... I think we should port this forward to trunk, but maybe make it possible to set the bits as we build up the values (single pass) if you specify up front you want the bit set.  I'll open a new issue for this...

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3390.
-----------------------------------

    Resolution: Fixed

Committed 3.x branch revision: 1173745

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.5, 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109556#comment-13109556 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

bq. I think that's ready to commit.

+1, looks great!  Thanks Uwe.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088852#comment-13088852 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

In Lucene trunk I believe this (how/where a doc that's missing the fields sorts) is now controllable: you can call SortField.setMissingValue.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109454#comment-13109454 ] 

Doron Cohen commented on LUCENE-3390:
-------------------------------------

I wrote a small test that should fail with the bug Uwe fixed here and pass with the fix. For some reason it is still failing even with that fix. Tried this with previous patch, will now try with last one, though I think it it should pass also with previous one. I'll give it another try.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-BitsInterface.patch

I added a further test in TestFieldCache to check the Bits returned.

I think that's ready to commit.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088832#comment-13088832 ] 

Hoss Man commented on LUCENE-3390:
----------------------------------

This is a fairly well known behavior of the FieldCache in Lucene (and all sorting uses FieldCache) ... i'm guessing it's probably not documented very well however.

There are a bunch of open issues that relate to trying to rectify this problem - but the one known work arround at this point is to *not* use NumericField, and use StrIndex based FieldCache (which does understand docs w/o values).  This is the approach used in Solr for the "Sortable*Field" numerics.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097161#comment-13097161 ] 

Michael McCandless commented on LUCENE-3390:
--------------------------------------------

I like how we solved this in 3.x!  Ie, a whole separate entry for holding a bitset indicating if the doc has a value.

This is generally useful, alone, ie one can just pull this bitset and use it directly.

It's also nice because it's one source that computes this, vs N copies (one per value) that we have on trunk.

I guess the downside is it takes 2 passes over the terms (one to get the values, another to fill this bitset), but maybe that tradeoff is worth not duplicating the code all over... maybe we should take a similar approach in trunk?

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-inverted.patch

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4, 3.5
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-fix-like-trunk.patch

Small change:
- added a TODO: In my opinion the mentioned check is bogus and can never work. IR.docFreq(new Term(field)) will never return the sum of all docFreqs from a field.
- Moved the empty bitmask check down and use cardinality().

I am currently looking into a possibility to remove the additional uninversion, if the FieldCache already uninverted the field. The bitset should be populated together with the values, but still stored as a separate cache entry.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-3390:
-------------------------------------

    Assignee: Uwe Schindler  (was: Doron Cohen)

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric (double) values for documents missing the sorting field

Posted by "Gilad Barkai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilad Barkai updated LUCENE-3390:
---------------------------------

    Description: 
While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

While the document with the missing value does match the query, I would expect it to come last, as it is not comparable by the other documents. For example, asking for the top 2 documents brings the document without any value which seems as a bug?

  was:
While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

Example code:
public static void main(String[] args) throws Exception {
	RAMDirectory d = new RAMDirectory();
	IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
	
	// 1st doc, value 3.5d
	Document doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(3.5d));
	w.addDocument(doc);
	
	// 2nd doc, value of -10d
	doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(-10d));
	w.addDocument(doc);
	
	// 3rd doc, no value at all
	w.addDocument(new Document());
	w.close();

	IndexSearcher s = new IndexSearcher(d);
	Sort sort = new Sort(new SortField("f", SortField.DOUBLE, true));
	TopDocs td = s.search(new MatchAllDocsQuery(), 10, sort);
	for (ScoreDoc sd : td.scoreDocs) {
		System.out.println(sd.doc + ": " + s.doc(sd.doc).get("f"));
	}
	s.close();
	d.close();
}
 


> Incorrect sort by Numeric (double) values for documents missing the sorting field
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> While the document with the missing value does match the query, I would expect it to come last, as it is not comparable by the other documents. For example, asking for the top 2 documents brings the document without any value which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-fix-like-trunk.patch

Here my final fix. The code in the comparator is exactly like trunk, the NumericComparator superclass was added and the API is now similar to trunk. Also fixed the remaining TODO.

With this code we break the already broken 3.4 code, but make it similar to trunk, so migration is easier again. It is unlikely that somebody is affected by this, in most cases a simple recompilation fixes the problem.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-3390.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 3.4
    Lucene Fields: [Patch Available]  (was: [New])

Fixed in 3.x r1164794.
Thanks Gilad!

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-fix-like-trunk.patch

Final patch:
- Added changes for backwards breaks
- Removed the bogus docFreq check
- Optimized the case of empty unvalued docs bit set (like in trunk)

This patch is now 100% in line with trunk. The code was already tested in trunk and does not affect sort speed for the common case without missing value, as the compiler will ignore the additional null check.

Will commit later this day.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108490#comment-13108490 ] 

Doron Cohen commented on LUCENE-3390:
-------------------------------------

Hi Uwe, thanks for catching this. 
I agree that this is a bug, and needs to be fixed.
Just to make sure that we agree on what the problem is, let me describe it again: in current 3x code in setNextReader() we extract the values from the cache, e.g. by {code}FieldCache.DEFAULT.getDoubles(reader, field, parser);{code} and, if a missing value was set, we iterate the unvalued docs and set them to that missing value. However this settings takes place at the same array just obtained from the cache, and so this is (1) inefficient as it will happen again in the next sort with same field, (2) incorrect as if two sorts of *same* field have different missing value they will collide, and (3) unsafe as you indicated.
I was very happy with the reuse of the cache for caching the missing values so I would like to try to solve this with that "frame"... More later...

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3390:
--------------------------------

    Attachment: LUCENE-3390-BitsInterface.patch

Attached patch with a test that fails before this fix (otherwise patch same as previous).

The test uses 4 collectors simultaneously, each with different missing values.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108016#comment-13108016 ] 

Uwe Schindler commented on LUCENE-3390:
---------------------------------------

bq. I am currently looking into a possibility to remove the additional uninversion, if the FieldCache already uninverted the field. The bitset should be populated together with the values, but still stored as a separate cache entry.

After reading the comments in LUCENE-2649 I don't want to discuss this again. In my opinion, it does not matter to always produce the bitset, even if not needed. But as there are people who care about maxDoc/8 additional bytes in FieldCache seem to fight against this idea, so I would leave this open.

I would like to fix the issue, any comments on the new and trunk-like FieldComparator implementation/class structure?


> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric (double) values for documents missing the sorting field

Posted by "Gilad Barkai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilad Barkai updated LUCENE-3390:
---------------------------------

    Attachment: SortByDouble.java

example code

> Incorrect sort by Numeric (double) values for documents missing the sorting field
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Example code:
> public static void main(String[] args) throws Exception {
> 	RAMDirectory d = new RAMDirectory();
> 	IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
> 	
> 	// 1st doc, value 3.5d
> 	Document doc = new Document();
> 	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(3.5d));
> 	w.addDocument(doc);
> 	
> 	// 2nd doc, value of -10d
> 	doc = new Document();
> 	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(-10d));
> 	w.addDocument(doc);
> 	
> 	// 3rd doc, no value at all
> 	w.addDocument(new Document());
> 	w.close();
> 	IndexSearcher s = new IndexSearcher(d);
> 	Sort sort = new Sort(new SortField("f", SortField.DOUBLE, true));
> 	TopDocs td = s.search(new MatchAllDocsQuery(), 10, sort);
> 	for (ScoreDoc sd : td.scoreDocs) {
> 		System.out.println(sd.doc + ": " + s.doc(sd.doc).get("f"));
> 	}
> 	s.close();
> 	d.close();
> }
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109411#comment-13109411 ] 

Uwe Schindler commented on LUCENE-3390:
---------------------------------------

In my opinion a much more clean and simple approach for FieldComaparator and all other stuff would be the following, as it removes all additional branches from FieldComaparator and makes the code as simple as it was before missingValues at all (also in trunk):

{quote}
Thinking more about it: Another apporoach (also possible for trunk) is to supply the missing value to FieldCache.getXxx(). The FieldCache would the first use Arrays.fill() to populate the FieldCache array with the default value and after that populate the index values. The drawback is that you get a separate FieldCache entry for each distinct missing value. For the above se case, you would have two float/double price caches.
{quote}

We just have to think about additional memory requirements (which would affect only users actually using different missingValues for several searches). From my perspective this is much cleaner, as you can pass in a missingValue directly when populating the FieldCache. FieldComaparator would simply call FieldCache.DEFAULT.getInts(reader, parser, defaultValue). The cache would use the triplet including defaultValue as key. The sorting code would not need to be changed at all (this is similar to Doron's idea, but moved to FieldCache and not FC.setNextReader).

We should think about this in an additional issue and for now only fix the broken implementation in 3.x.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107480#comment-13107480 ] 

Uwe Schindler edited comment on LUCENE-3390 at 9/18/11 5:43 PM:
----------------------------------------------------------------

The problem is that it's not easy to fix, because the unvalued values BitSet is exposed as DocIdSet not Bits like in trunk. The only quick fix is to clone the cached values before modifying in setNextReader and leave code unchanged.

Correct fix is to return Bits from getUnvaluedValues and so can do random access in compare/compareBottom, FieldComparator code can be copied 1:1 from trunk, only small changes needed. This would break code already using the new APIs in 3.x, but this is too broken, so we *must* change/fix this.

      was (Author: thetaphi):
    The problem is that it's not easy to fix, because the unvalued values BitSet is exposed as DocIdSet not Bits like in trunk. The only quick fix is to clone the cached values before modifying in setNextReader and leave code unchanged.

Correct fix is to return Bits from getUnvaluedValues and so can do random access in compare/compareBottom, FieldComparator code can be copied 1:1 from trunk, only small changes needed.
  
> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107480#comment-13107480 ] 

Uwe Schindler commented on LUCENE-3390:
---------------------------------------

The problem is that it's not easy to fix, because the unvalued values BitSet is exposed as DocIdSet not Bits like in trunk. The only quick fix is to clone the cached values in setNextReader and leave code unchanged.

Correct fix is to return Bits from getUnvaluedValues and so can do random access in compare/compareBottom, FieldComparator code can be copied 1:1 from trunk, only small changes needed.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Gilad Barkai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilad Barkai updated LUCENE-3390:
---------------------------------

         Labels: double float int long numeric sort  (was: double numeric sort)
    Description: 
While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

  was:
While sorting results over a numeric double field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.

While the document with the missing value does match the query, I would expect it to come last, as it is not comparable by the other documents. For example, asking for the top 2 documents brings the document without any value which seems as a bug?

        Summary: Incorrect sort by Numeric values for documents missing the sorting field  (was: Incorrect sort by Numeric (double) values for documents missing the sorting field)

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-inverted.patch

Patch with the BitSet inverted. We break backwards compatibility so this is not an issue at all.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4, 3.5
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107480#comment-13107480 ] 

Uwe Schindler edited comment on LUCENE-3390 at 9/18/11 5:42 PM:
----------------------------------------------------------------

The problem is that it's not easy to fix, because the unvalued values BitSet is exposed as DocIdSet not Bits like in trunk. The only quick fix is to clone the cached values before modifying in setNextReader and leave code unchanged.

Correct fix is to return Bits from getUnvaluedValues and so can do random access in compare/compareBottom, FieldComparator code can be copied 1:1 from trunk, only small changes needed.

      was (Author: thetaphi):
    The problem is that it's not easy to fix, because the unvalued values BitSet is exposed as DocIdSet not Bits like in trunk. The only quick fix is to clone the cached values in setNextReader and leave code unchanged.

Correct fix is to return Bits from getUnvaluedValues and so can do random access in compare/compareBottom, FieldComparator code can be copied 1:1 from trunk, only small changes needed.
  
> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109332#comment-13109332 ] 

Uwe Schindler commented on LUCENE-3390:
---------------------------------------

Doron: That's exactly the problem. This easy use case is problematic:

You allow sorting by Price. The user can switch between forward and backward sorting. In all cases, you want all articles without a price at the beginning. To achieve this, you have to set the price value e.g. to negative_infinity for the forward sorting, but positive_infinity for backwards sorting. If now two users are using your user interface in parallel, they collide.

The fix used here is identical to Lucene trunk and we should keep the code similar. FieldComparator is now almost identical between trunk and 3.x (except the new BytesRef/Docvalues stuff in trunk).

Thinking more about it: Another apporoach (also possible for trunk) is to supply the missing value to FieldCache.getXxx(). The FieldCache would the first use Arrays.fill() to populate the FieldCache array with the default value and after that populate the index values. The drawback is that you get a separate FieldCache entry for each distinct missing value. For the above se case, you would have two float/double price caches.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3390:
----------------------------------

    Attachment: LUCENE-3390-BitsInterface.patch

Here a patch with a more clean API (as noted by Mike McCandless):
- backported the Bits interface from Lucene trunk (do a svn cp http://svn.apache.org/..../trunk/..../Bits.java before applying the patch
- Added interface to the well-known impls in util package
- FieldCache.getUnValuesDocs returns Bits now which makes the API very clean

This breaks backwards a bit more, as Bits does not extend DocIdSet, so code using the new FieldCache method will break, before recompilation was enough (as FixedBitSet extends DocIdSet).

Mike: How about this?

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reopened LUCENE-3390:
-----------------------------------


There is a bug in this impl:
If you have two different SortFields with different missing values and you do sorts on both in parallel, the setNextReader in the comparator modifies the shared field cache, leading to concurrency issues.

I dont understand why this is done in setNextReader? Why not do the Bits check in the comparator itsself instead of merging the missing values directly into the field cache, which is a violation to multithreading, caching,...

Also doing the merging of missing values into the cache on every setNextReader seems like lots of additional work on every sort.

We should do the check for the missing value directly in the compare/compareBottom method (like we do on trunk).

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4
>
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reopened LUCENE-3390:
-----------------------------------


When discussing about the forward port with Mike McCandless on IRC, we thought the double inversion is useless (it was in Doron's patch, because he wanted to use DocIdSetIterator effectively).

We changed the name to FieldCache.getDocsWithField().

Patch is easy.

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.4, 3.5
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-inverted.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3390.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 3.5

Committed 3.x branch revision: 1173701

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Uwe Schindler
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>             Fix For: 3.5, 3.4
>
>         Attachments: LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-3390:
-----------------------------------

    Assignee: Doron Cohen

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Assignee: Doron Cohen
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-3390:
--------------------------------

    Attachment: LUCENE-3390.patch

Attached patch fixing this bug. 
TestSort was enhanced to test the new setMissingValue() method - actually merging the test from trunk r1002460 (LUCENE-2671).

All search test passed (running the rest now..)

> Incorrect sort by Numeric values for documents missing the sorting field
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, float, int, long, numeric, sort
>         Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested against Double, Float, Int & Long numeric fields ascending and descending order).
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution would either be allowing the user to define such a "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org