You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Todd Feak (JIRA)" <ji...@apache.org> on 2008/10/09 01:31:45 UTC

[jira] Created: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
-------------------------------------------------------------------------------------

                 Key: LUCENE-1415
                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 2.4
            Reporter: Todd Feak


I found this while hunting for the cause of Solr Cache misses.

The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.

I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638319#action_12638319 ] 

tfeak edited comment on LUCENE-1415 at 10/9/08 9:31 AM:
------------------------------------------------------------

I've attached a TestCase demonstrating the broken functionality.

I realize that this isn't the standard format. I'm not setup for creating SVN patches from my current workstation, and I'm in a bit of a crunch. I hope that this can at least provide some level of assistance in rectifying this situation.

      was (Author: tfeak):
    TestCase demonstrating the broken functionality.

I realize that this isn't the standard format. I apologize, as this is all I have time for right now.
  
> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1415:
------------------------------

    Attachment: MultiPhraseQuery.java

Attached a copy of what I did to MultiPhraseQuery to fix the issue. This was created from the 2.4.0 source code. Implementation of hashCode() and equals() uses the Java List implementation  as a base so to achieve what looks like the original intent of the comparisons, just taking into account the Term[].

Again, sorry it's not in the correct format. Hope it helps.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved LUCENE-1415.
----------------------------------

    Resolution: Fixed

Thanks, I just committed this.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>            Assignee: Yonik Seeley
>         Attachments: LUCENE-1415.patch, LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1415:
--------------------------------

    Attachment: LUCENE-1415.patch

Hmmm...I really thought I had my environment setup to limit to 1.4 code...would appear thats not working...

Here is a 1.4 patch.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: LUCENE-1415.patch, LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638378#action_12638378 ] 

Uwe Schindler commented on LUCENE-1415:
---------------------------------------

That's clear: If you set the compilation option of Java 5/6 to limit to 1.4 features, this prevents you from using language features of 5. But the underlying class library is from the java distribution, the compiler comes from (Java5's rt.jar), which contains Arrays.hashCode(). The compiler cannot know, that Arrays.hashCode is not available in 1.4 unless it uses an old rt.jar. If you want to be sure to compile 1.4 only, you have to install Java 1.4.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: LUCENE-1415.patch, LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley reassigned LUCENE-1415:
------------------------------------

    Assignee: Yonik Seeley

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>            Assignee: Yonik Seeley
>         Attachments: LUCENE-1415.patch, LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638163#action_12638163 ] 

Yonik Seeley commented on LUCENE-1415:
--------------------------------------

Good catch Todd,  this can be demonstrated in Solr with the example server and a query of
http://localhost:8983/solr/select/?q=ccc
(ccc has synonyms which end up creating a MultiPhraseQuery)

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1415:
--------------------------------

    Attachment: LUCENE-1415.patch

Patch that cleans up formating and merges the unit test with the existing multiphrasequery test.

Without multiphrasequery change, new test fails. With change, all tests pass.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Feak updated LUCENE-1415:
------------------------------

    Attachment: MultiPhraseQueryTest.java

TestCase demonstrating the broken functionality.

I realize that this isn't the standard format. I apologize, as this is all I have time for right now.

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1415) MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638342#action_12638342 ] 

Yonik Seeley commented on LUCENE-1415:
--------------------------------------

Thanks guys,
I believe Arrays.hashCode() is a Java 5 feature?

> MultiPhraseQuery has incorrect hashCode() implementation - Leads to Solr Cache misses
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1415
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1415
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Todd Feak
>         Attachments: LUCENE-1415.patch, MultiPhraseQuery.java, MultiPhraseQueryTest.java
>
>
> I found this while hunting for the cause of Solr Cache misses.
> The MultiPhraseQuery class hashCode() implementation is non-deterministic. It uses termArrays.hashCode() in the computation. The contents of that ArrayList are actually arrays themselves, which return there reference ID as a hashCode instead of returning a hashCode which is based on the contents of the array. I would suggest an implementation involving the Arrays.hashCode() method.
> I will try to submit a patch soon, off for today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org