You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/04/02 03:04:53 UTC
[jira] [Updated] (TEZ-2192) Relocalization does not check for
source
[ https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hitesh Shah updated TEZ-2192:
-----------------------------
Attachment: TEZ-2192.2.patch
Comments addressed
> Relocalization does not check for source
> ----------------------------------------
>
> Key: TEZ-2192
> URL: https://issues.apache.org/jira/browse/TEZ-2192
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.0, 0.5.2
> Reporter: Rohini Palaniswamy
> Assignee: Hitesh Shah
> Priority: Blocker
> Attachments: TEZ-2192.1.patch, TEZ-2192.2.patch
>
>
> PIG-4443 spills the input splits to disk if serialized split size is greater than some threshold. It faces issues with relocalization when more than one vertex has job.split file. If a job.split file is already there on container reuse, it is reused causing wrong data to be read.
> Either need a way to turn off relocalization or check the source+timestamp and redownload the file during relocalization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)