You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/09/09 23:49:00 UTC

[jira] [Work logged] (HIVE-24936) Fix file name parsing and copy file move.

     [ https://issues.apache.org/jira/browse/HIVE-24936?focusedWorklogId=648920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-648920 ]

ASF GitHub Bot logged work on HIVE-24936:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Sep/21 23:48
            Start Date: 09/Sep/21 23:48
    Worklog Time Spent: 10m 
      Work Description: harishjp opened a new pull request #2628:
URL: https://github.com/apache/hive/pull/2628


   * HIVE-25130: handle spark inserted files during alter table concat
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 648920)
    Time Spent: 1.5h  (was: 1h 20m)

> Fix file name parsing and copy file move.
> -----------------------------------------
>
>                 Key: HIVE-24936
>                 URL: https://issues.apache.org/jira/browse/HIVE-24936
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Harish JP
>            Assignee: Harish JP
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The taskId and taskAttemptId is not extracted correctly for copy files (00001_02_copy_3) and when doing a move file of an incompatible copy file the rename utility generates wrong file names. Ex: 00001_02_copy_3 is renamed to 00001_02_copy_3_1 if 00001_02_copy_3 already exists, ideally it should be 00001_02_copy_N.
>  
> Incompatible files should be always renamed using the current task or it can get deleted if the file name conflicts with another task output file. Ex: if the input file name for a task is 00005_01 and is incompatible then if we move this file, it will be treated as an output file for task id 5, attempt 1 which if exists will try to generate the same file and fail and another attempt will be made. There will be 2 files 00005_01, 00005_02, the deduping code will remove 00005_01 resulting in data loss. There are other scenarios where the same can happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)