You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Gil Vernik (JIRA)" <ji...@apache.org> on 2017/03/02 12:01:45 UTC

[jira] [Updated] (MAPREDUCE-6854) Each map task should create a unique temporary name that includes object name

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gil Vernik updated MAPREDUCE-6854:
----------------------------------
    Description: 
Consider an example: a local file "/data/a.txt"  need to be copied into swift://container.service/data/a.txt

The way distcp works is that first it will upload "/data/a.txt" into swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0

Upon completion distcp will move   swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0 into swift://container.mil01/data/a.txt

The temporary file naming convention assumes that each map task will sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID
and then rename them to the final names.  Such flow is problematic in the object stores, where it usually advised not to create, delete and create object under the same name. 

This JIRA propose to add a configuration key indicating that temporary objects will also include object name as part of their temporary file name,

For example
"/data/a.txt" will be uploaded into "swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"/a.txt" or 
"swift://container.mil01/data/a.txt/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"

  was:
Consider an example: a local file "/data/a.txt"  need to be copied into swift://container.service/data/a.txt

The way distcp works is that first it will upload "/data/a.txt" into swift://container.mil01/data3/.distcp.tmp.attempt_local2036034928_0001_m_000000_0

Upon completion distcp will move   swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0 into swift://container.mil01/data/a.txt

The temporary file naming convention assumes that each map task will sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID
and then rename them to the final names.  Such flow is problematic in the object stores, where it usually advised not to create, delete and create object under the same name. 

This JIRA propose to add a configuration key indicating that temporary objects will also include object name as part of their temporary file name,

For example
"/data/a.txt" will be uploaded into "swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"/a.txt" or 
"swift://container.mil01/data/a.txt/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"


> Each map task should create a unique temporary name that includes object name
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6854
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6854
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>            Reporter: Gil Vernik
>
> Consider an example: a local file "/data/a.txt"  need to be copied into swift://container.service/data/a.txt
> The way distcp works is that first it will upload "/data/a.txt" into swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0
> Upon completion distcp will move   swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0 into swift://container.mil01/data/a.txt
> The temporary file naming convention assumes that each map task will sequentially create objects as swift://container.mil01/.distcp.tmp.attempt_ID
> and then rename them to the final names.  Such flow is problematic in the object stores, where it usually advised not to create, delete and create object under the same name. 
> This JIRA propose to add a configuration key indicating that temporary objects will also include object name as part of their temporary file name,
> For example
> "/data/a.txt" will be uploaded into "swift://container.mil01/data/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"/a.txt" or 
> "swift://container.mil01/data/a.txt/.distcp.tmp.attempt_local2036034928_0001_m_000000_0"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org