You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Adam Warrington (JIRA)" <ji...@apache.org> on 2011/04/04 20:29:05 UTC

[jira] [Updated] (PIG-1702) Streaming debug output outputs null input-split information

     [ https://issues.apache.org/jira/browse/PIG-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Warrington updated PIG-1702:
---------------------------------

    Attachment: PIG-1702-0.patch

Here is a patch that fixes the Header output by retrieving the information (the path, start offset, and length) from the FileSplit. 

One potential issue with this code is that it has to gain a reference to the current MapContext, which it does from PigMapReduce.sJobContext, and if PIG is running in local mode, there may be a race condition. PIG-1831 solved a similar issue with the configuration. Would it be wise to use a thread local variable in PigMapReduce for the context as well?

> Streaming debug output outputs null input-split information
> -----------------------------------------------------------
>
>                 Key: PIG-1702
>                 URL: https://issues.apache.org/jira/browse/PIG-1702
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Adam Warrington
>            Priority: Minor
>         Attachments: PIG-1702-0.patch
>
>
> Within the Pig streaming command execution, debug information is printed out to stderr which specified the input file, as well as split information. The function is org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.writeDebugHeader(). Pig 0.7 outputs null for the split file, and -1 for the split start-offset and split length. Example output:
> ===== Task Information Header =====
> Command: test.pl (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)
> Start time: Mon Oct 25 21:24:45 EDT 2010
> Input-split file: null
> Input-split start-offset: -1
> Input-split length: -1
> Within the writeDebugHeader() function, the input file information is obtained by querying for the "map.input.file" configuration variable. This configuration variable was set by the old hadoop m/r api, but not by the 0.20 api, which Pig 0.7 now uses. The new way to get this information is with something like: ((FileSplit) context.getInputSplit).getPath(). See HADOOP-5973.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira