You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/26 13:39:00 UTC
[jira] [Work logged] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

     [ https://issues.apache.org/jira/browse/HIVE-24683?focusedWorklogId=542177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542177 ]

ASF GitHub Bot logged work on HIVE-24683:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jan/21 13:38
            Start Date: 26/Jan/21 13:38
    Worklog Time Spent: 10m 
      Work Description: szlta opened a new pull request #1911:
URL: https://github.com/apache/hive/pull/1911


   HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if it's available. This refactor opens an opportunity for NPE to happen:
   
   Caused by: java.lang.NullPointerException
   at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
   at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
   at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
   at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
   at org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
   at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
   at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
   at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581)
   ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by looking at the bucket number (from the corresponding split) but this file may not exist if no deletion happen from that particular bucket.
   
   Earlier this was handled by always trying to open an ORC reader on the path and catching FileNotFoundException. However in the refactor we first try to look into the cache, and for that try to retrieve a file ID first. This entails a getFileStatus call on HDFS which returns null for non-existing paths, causing the NPE eventually.
   
   This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId should be refactored in a way that it's not error prone anymore.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 542177)
    Remaining Estimate: 0h
            Time Spent: 10m

> Hadoop23Shims getFileId prone to NPE for non-existing paths
> -----------------------------------------------------------
>
>                 Key: HIVE-24683
>                 URL: https://issues.apache.org/jira/browse/HIVE-24683
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by looking at the bucket number (from the corresponding split) but this file may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path and catching FileNotFoundException. However in the refactor we first try to look into the cache, and for that try to retrieve a file ID first. This entails a getFileStatus call on HDFS which returns null for non-existing paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId should be refactored in a way that it's not error prone anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)