You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Taraka Rama Rao Lethavadla (Jira)" <ji...@apache.org> on 2023/02/13 07:17:00 UTC

[jira] [Commented] (HIVE-27071) Select query with LIMIT clause can fail if there are marker files like "_SUCCESS" and "_MANIFEST"

    [ https://issues.apache.org/jira/browse/HIVE-27071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687769#comment-17687769 ] 

Taraka Rama Rao Lethavadla commented on HIVE-27071:
---------------------------------------------------

In addition to what is reported already, how about providing a regex support in query to skip the files matching the regex while running the query. One advantage with this is that we can skip too many unwanted files that are not relevant to hive every time a query is run

> Select query with LIMIT clause can fail if there are marker files like "_SUCCESS" and "_MANIFEST"
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27071
>                 URL: https://issues.apache.org/jira/browse/HIVE-27071
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0
>            Reporter: Sai Hemanth Gantasala
>            Priority: Major
>
> Spark clients creates marker files like "_SUCCESS" and "_MANIFEST" under the table/partition path at the end of a write operation. For example 'hdfs://name-node-host/table/partition/_SUCCESS'
> Whenever Hive is trying to read that table with the LIMIT clause, it could to the following error:
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1676095298574_0017_2_00, diagnostics=[Vertex vertex_1676095298574_0017_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: trade initializer failed, vertex=vertex_1676095298574_0017_2_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://name-node-host/table/partition/_MANIFEST
> Input path does not exist: hdfs://name-node-host/table/partition/_SUCCESS at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:300)
> at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:240)
> at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:328)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:579) {code}
> Hive execution engine should ignore these marker files while reading the table/partition data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)