You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ádám Szita (Jira)" <ji...@apache.org> on 2021/01/25 16:57:00 UTC

[jira] [Created] (HIVE-24683) NPE in Hadoop23Shims due to non-existing delete delta paths

Ádám Szita created HIVE-24683:
---------------------------------

             Summary: NPE in Hadoop23Shims due to non-existing delete delta paths
                 Key: HIVE-24683
                 URL: https://issues.apache.org/jira/browse/HIVE-24683
             Project: Hive
          Issue Type: Bug
            Reporter: Ádám Szita
            Assignee: Ádám Szita


HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if it's available. This refactor opens an opportunity for NPE to happen:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
at org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by looking at the bucket number (from the corresponding split) but this file may not exist if no deletion happen from that particular bucket.

Earlier this was handled by always trying to open an ORC reader on the path and catching FileNotFoundException. However in the refactor we first try to look into the cache, and for that try to retrieve a file ID first. This entails a getFileStatus call on HDFS which returns null for non-existing paths, causing the NPE eventually.

This needs to be wrapped around by a null check in Hadoop23Shims..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)