You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/04/14 02:57:12 UTC

[jira] [Commented] (TEZ-2315) TEZ does not call checkOutputSpecs on OutputFormat

    [ https://issues.apache.org/jira/browse/TEZ-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493385#comment-14493385 ] 

Hitesh Shah commented on TEZ-2315:
----------------------------------

That is unlikely to work. The user payload provided to Tez to configure the output is in binary format. Calling the Tez OutputCommitter on the client to do some initial checks which will then update the binary user payload for the vertex is not something that seems like a good api for Tez.

This bug probably needs to fix in the tez-execution part of Hive and not in Tez itself. Though from what I understand Hive uses a "NullOutputCommitter" 

> TEZ does not call checkOutputSpecs on OutputFormat
> --------------------------------------------------
>
>                 Key: TEZ-2315
>                 URL: https://issues.apache.org/jira/browse/TEZ-2315
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> In JobSubmitter::submitJobInternal, MR does:
> {noformat}
>   private void checkSpecs(Job job) throws ClassNotFoundException, 
>       InterruptedException, IOException {
>     JobConf jConf = (JobConf)job.getConfiguration();
>     // Check the output specification
>     if (jConf.getNumReduceTasks() == 0 ? 
>         jConf.getUseNewMapper() : jConf.getUseNewReducer()) {
>       org.apache.hadoop.mapreduce.OutputFormat<?, ?> output =
>         ReflectionUtils.newInstance(job.getOutputFormatClass(),
>           job.getConfiguration());
>       output.checkOutputSpecs(job);
>     } else {
>       jConf.getOutputFormat().checkOutputSpecs(jtFs, jConf);
>     }
>   }
> {noformat}
> Note that if outputformat does not exist, it is created via reflection specifically for this call. 
> Tez should also call this. In Hive, via HiveOutputFormatImpl, this methods is called on FileSinkOperator, which calls it on custom formats. This is necessary for some of them because they set configuration there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)