You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/04/02 02:38:00 UTC

[jira] [Updated] (HUDI-3776) Fix BloomIndex incorrectly using ColStats to lookup records locations

     [ https://issues.apache.org/jira/browse/HUDI-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin updated HUDI-3776:
----------------------------------
    Description: 
Currently, BloomIndex tries to rely solely on Column Stats to lookup records locations. This is however incorrect, since CS state might not be complete at any given moment; instead we should use it on the basis of best effort (not assuming that it would have any record at all), and for those files that are not found in ColStats we should list from them directly.

You can search in code for "HUDI-3776" to see exact code location this is related to

  was:
Currently, BloomIndex tries to rely solely on Column Stats to lookup records locations. This is however incorrect, since CS state might not be complete at any given moment; instead we should use it on the basis of best effort (not assuming that it would have any record at all), and for those files that are not found in ColStats we should list from them directly.

 


> Fix BloomIndex incorrectly using ColStats to lookup records locations
> ---------------------------------------------------------------------
>
>                 Key: HUDI-3776
>                 URL: https://issues.apache.org/jira/browse/HUDI-3776
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Alexey Kudinkin
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Currently, BloomIndex tries to rely solely on Column Stats to lookup records locations. This is however incorrect, since CS state might not be complete at any given moment; instead we should use it on the basis of best effort (not assuming that it would have any record at all), and for those files that are not found in ColStats we should list from them directly.
> You can search in code for "HUDI-3776" to see exact code location this is related to



--
This message was sent by Atlassian Jira
(v8.20.1#820001)