You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2022/06/10 12:50:00 UTC

[jira] [Comment Edited] (LUCENE-10610) RunAutomaton#hashCode() can easily cause hash collision for different Automatons

    [ https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552733#comment-17552733 ] 

Uwe Schindler edited comment on LUCENE-10610 at 6/10/22 12:49 PM:
------------------------------------------------------------------

Thanks for finding this. The solution is:
- Make equals and hashCode symmetric in what it includes
- cache the hashCode for performance by either calculating it in constructor or do lazy init using a transient field. A Integer object (initially null) may also be a good candidate for this. No synchronization needed, as different threads may create the same cached value in parallel which won't hurt


was (Author: thetaphi):
Thanks for finding this. The solution is:
- Make equals and hashCode symmetric in what it includes
- cache the hashCode for performance by either calculating it in constructor or do lazy init using a transient field. A SetOnce<Integer> may also be a good candidate for this. No synchronization needed, as different threads may create the same cached value in parallel which won't hurt

> RunAutomaton#hashCode() can easily cause hash collision for different Automatons
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-10610
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10610
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Tomoko Uchida
>            Priority: Minor
>
> Current RunAutomaton#hashCode() is:
> {code:java}
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + alphabetSize;
>     result = prime * result + points.length;
>     result = prime * result + size;
>     return result;
>   }
> {code}
> Since it does not take account of the contents of the {{points}} array, this returns the same value for different automatons when their alphabet size and state size are the same.
> For example, this test code passes.
> {code:java}
>   public void testHashCode() throws IOException {
>     PrefixQuery q1 = new PrefixQuery(new Term("field", "aba"));
>     PrefixQuery q2 = new PrefixQuery(new Term("field", "fee"));
>     assert q1.compiled.runAutomaton.hashCode() == q2.compiled.runAutomaton.hashCode();
>   }
> {code}
> I suspect this is a bug?
> Note that I think it's not a serious one; all callers of this {{hashCode()}} take account of additional information when calculating their own hash value, it seems there is no substantial impact on higher-level APIs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org