You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/11/18 07:24:00 UTC

[jira] [Work logged] (HIVE-26758) Allow use scratchdir for staging

     [ https://issues.apache.org/jira/browse/HIVE-26758?focusedWorklogId=827063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-827063 ]

ASF GitHub Bot logged work on HIVE-26758:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Nov/22 07:23
            Start Date: 18/Nov/22 07:23
    Worklog Time Spent: 10m 
      Work Description: yigress opened a new pull request, #3781:
URL: https://github.com/apache/hive/pull/3781

   
   ### What changes were proposed in this pull request?
   
   1. add a hive configuration hive.use.scratchdir.for.staging
   
   2. for native table, no-mm, no-direct-insert, no-acid, change dynamic partition staging directory layout from
   <dest_path>/<static_partition>/<staging_dir>/<dynamic_partition>
   to 
   <dest_path>/<staging_dir>/<static_partition>/<dynamic_partition>
   
   3. when hive.use.scratchdir.for.staging=true, FileSinkOperator's dirName, DynamicContext's sourcePath change from
   <dest_path>/{hive.exec.stagingdir}
   to
   <hive.exec.scratchdir>
   
   
   
   ### Why are the changes needed?
   
   In the S3 blobstorage optimization, HIVE-15121 / HIVE-17620 changed interim job path to use hive.exec.scracthdir, final job to use hive.exec.stagingdir. https://issues.apache.org/jira/browse/HIVE-15215 is open whether to use scratch for staging dir for S3. 
   
   However for blobstorage where 'rename' is slow and no encryption, it can help performance to use scratchdir to staging query results and use the MoveTask to copy to blobstorage. This is especially true when there is FileMerge task.
   This may also help cross-filesystem when user wants to use local cluster filesystem to staging query results and move the results to destination filesystem.
   
   
   ### Does this PR introduce _any_ user-facing change?
   This adds a new hive configuration.
   
   
   ### How was this patch tested?
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 827063)
    Remaining Estimate: 0h
            Time Spent: 10m

> Allow use scratchdir for staging
> --------------------------------
>
>                 Key: HIVE-26758
>                 URL: https://issues.apache.org/jira/browse/HIVE-26758
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Yi Zhang
>            Assignee: Yi Zhang
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the destination path <destination_dir>/<staging_dir>/
> It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620.
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration 
> hive.use.scratchdir_for_staging
> This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging.
> main change:
> for dynamic partitions that has static partition it was
> <destination_dir>/<static_partition>/<staging_dir>/<dynamic_partition>
> changes to 
> <destination_dir>/<staging_dir>/<static_partition>/<dynamic_partition>
> or in case of \{hive.use.scratchdir_for_staging}
> <scratch_dir>/<static_partition>/<dynamic_partition>
> the change is due to that hive relies on parsing the path to discover partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)