You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Hao (Jira)" <ji...@apache.org> on 2022/06/05 16:34:00 UTC

[jira] [Commented] (HUDI-4192) HoodieHFileReader scan the cells of Header IndexColumn throw NullPointerException

    [ https://issues.apache.org/jira/browse/HUDI-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550233#comment-17550233 ] 

Hao commented on HUDI-4192:
---------------------------

The recurrence steps are as follows:

In the HFile of attachment, including indexes of the 3 columns, which are "dtm", "hh", and "dvce_id"  sequentially. a simple reproduction code is as follows:
{code:java}
      HoodieHFileReader<GenericRecord> hfileReader =
        (HoodieHFileReader<GenericRecord>) createReader(new Configuration());
    List<String> keyPrefixes = new ArrayList<>();
    keyPrefixes.add("YAWXdbh2gWI=");  // keyprefix of "dvce_id"
    keyPrefixes.add("Bkmxu5plBpg=");  // keyprefix of "dtm"
     Iterator<GenericRecord> iterator = hfileReader.getRecordsByKeyPrefixIterator(keyPrefixes);
    while (iterator.hasNext()) {
      GenericRecord record = iterator.next();  // which will throw NullPointerException
    } {code}

> HoodieHFileReader scan the cells of Header IndexColumn throw NullPointerException
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-4192
>                 URL: https://issues.apache.org/jira/browse/HUDI-4192
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Hao
>            Priority: Minor
>             Fix For: 0.12.0
>
>         Attachments: col-stats-0097_86-717-560846_20220605111639266001.hfile
>
>
> Assume we index N columns in the MetaTable, such as col_1, col_2... col_n
> When executing a query that "{*}selects * from table where col_n = 'xx' and col1 = 'xx'{*}"".
> In the process of scanning the hfiles of MetaTable, there are acually 2 steps:
> Firstly, the col_n cells will be scanned in the hfile (mainly to obtain the minmax), once the scan is completed, the scanner is already at the end of the file.
> Secondly, at this time when the cell of the col_1 is scanned, because the seekTo is not called in time to back to the file header, it will encounter the scanner.getCell to report the NullPointerException exception.
>  
> !https://issues.apache.org/vision-file-storage/api/file/download/upload-v2/2022/5/5/h00424960/f2c6fbea023242939c3e804a280cd642/image.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)