You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/10/26 22:01:00 UTC

[jira] [Commented] (SOLR-10918) Hashing for IntPointFields is broken for HLL

    [ https://issues.apache.org/jira/browse/SOLR-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624772#comment-17624772 ] 

ASF subversion and git services commented on SOLR-10918:
--------------------------------------------------------

Commit 282fb99e1481c2366dbf8446acaad8ba482229fd in solr's branch refs/heads/main from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=282fb99e148 ]

SOLR-10918: Fix IntPointField hashing for HLL (#1137)



> Hashing for IntPointFields is broken for HLL
> --------------------------------------------
>
>                 Key: SOLR-10918
>                 URL: https://issues.apache.org/jira/browse/SOLR-10918
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Chris M. Hostetter
>            Assignee: Houston Putman
>            Priority: Major
>              Labels: numeric-tries-to-points
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> discovered as part of SOLR-10807...
> when using Points based numerics, the HLL estimates using the raw values vs the hashed values disagree slightly -- this suggests some possible bug (or the very least: room for optimization) when using Points fields.
> Example from SOLR-10807 when swaping IntPointField in place of TrieIntField...
> {code}
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestDistributedStatsComponentCardinality -Dtests.method=test -Dtests.seed=63854996088ED7B7 -Dtests.slow=true -Dtests.locale=de-GR -Dtests.timezone=Etc/UCT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
>    [junit4] FAILURE 13.3s J2 | TestDistributedStatsComponentCardinality.test <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: int_i: hashed vs prehashed, real=7260, p=q=id:[1186+TO+8445]&rows=0&stats=true&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}int_i&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}int_i_prehashed_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}long_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}long_l_prehashed_l&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8}string_s&stats.field={!cardinality%3Dtrue+hllLog2m%3D7+hllRegwidth%3D8+hllPreHashed%3Dtrue}string_s_prehashed_l expected:<6632> but was:<7929>
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([63854996088ED7B7:EBD1764CA672BA4F]:0)
>    [junit4]    >        at org.apache.solr.handler.component.TestDistributedStatsComponentCardinality.test(TestDistributedStatsComponentCardinality.java:149)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org