You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2017/08/11 09:00:00 UTC

[jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

    [ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051 ] 

Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 8:59 AM:
------------------------------------------------------------------

Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load.
- No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually.


was (Author: sankarh):
Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually.

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17289
>                 URL: https://issues.apache.org/jira/browse/HIVE-17289
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, Export, Import, replication
>             Fix For: 3.0.0
>
>         Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT uses distcp to copy the larger files/large number of files from dump directory to table staging directory. But, this copy fails as distcp is always done with doAs user specified in hive.distcp.privileged.doAs, which is "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)