You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Viraj Bhat (JIRA)" <ji...@apache.org> on 2009/01/13 23:31:03 UTC

[jira] Created: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
---------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-619
                 URL: https://issues.apache.org/jira/browse/PIG-619
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
         Environment: Hadoop 18, Multi-node hadoop installation
            Reporter: Viraj Bhat
             Fix For: types_branch


Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
==============================================================================================================
2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
==============================================================================================================
But on a multi-node Hadoop installation, the script fails with the following error:
==============================================================================================================
2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
        at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
        at org.apache.pig.PigServer.openIterator(PigServer.java:408)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
        at org.apache.pig.Main.main(Main.java:306)
Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
        ... 7 more
Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
        at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
        at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
        at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
        ... 6 more
==============================================================================================================
{code}
RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
RAW_LOGS = limit RAW_LOGS 2;
FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
                             group, COUNT(EMPTY_FILTERED_LOGS);
explain COUNT_EMPTYFILTERED_LOGS;
dump COUNT_EMPTYFILTERED_LOGS;
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-619:
----------------------------------

    Assignee: Alan Gates

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: mydata.txt, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710170#action_12710170 ] 

Hadoop QA commented on PIG-619:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408213/PIG-619.patch
  against trunk revision 775340.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/44/console

This message is automatically generated.

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>             Fix For: 0.3.0
>
>         Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-619:
---------------------------

    Attachment: tmpfileload.pig

Dumping empty results script

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>             Fix For: types_branch
>
>         Attachments: tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703940#action_12703940 ] 

Alan Gates commented on PIG-619:
--------------------------------

Does fixing this still make sense?  IIRC the main reason for doing the store/load thing in the middle was to deal with the fact that Pig couldn't do multiple stores in one script without re-running the entire script.  But since that is in the process of being changed (see PIG-627), this should no longer be necessary.

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: mydata.txt, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-619:
---------------------------

    Fix Version/s: 0.3.0
           Status: Patch Available  (was: Open)

In order to see this behavior, you need three map reduce jobs, something like:

A = load
B = filter everything out
C = group
D = foreach
E = distinct
F = group
G = foreach
store G

In this case the first job (A-D) will run and produce 0 length part files.  The second job (E) will run, but no maps will be started because the files are zero length.  As a result Hadoop now seems to create no output files for this second job.  The third job (F-G) then fails complaining that the input files don't exist.  The patch changes pig's slicer to return at least one input split per part file even when the file is zero length.

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>             Fix For: 0.3.0
>
>         Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709894#action_12709894 ] 

Olga Natkovich commented on PIG-619:
------------------------------------

+1 on the patch. It would be good to add a comment that explains why we are doing this. 

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>             Fix For: 0.3.0
>
>         Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-619:
---------------------------

    Attachment: mydata.txt

Test data

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>             Fix For: types_branch
>
>         Attachments: mydata.txt, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-619:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked in.

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>             Fix For: 0.3.0
>
>         Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-619:
---------------------------

    Attachment: PIG-619.patch

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703950#action_12703950 ] 

Alan Gates commented on PIG-619:
--------------------------------

Phase 2 (merging jobs just in the map phase), is already in.  Phase 3 (merging jobs across map/reduce boundaries) should be in by end of this week.

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: mydata.txt, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-619) Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage" message

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703941#action_12703941 ] 

Viraj Bhat commented on PIG-619:
--------------------------------

So when does the Multi-Store query optimization get committed/merged  into the main branch, (where this is default way the multi-store happens). 
Viraj

> Dumping empty results produces "Unable to get results for /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage" message
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-619
>                 URL: https://issues.apache.org/jira/browse/PIG-619
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop 18, Multi-node hadoop installation
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>         Attachments: mydata.txt, tmpfileload.pig
>
>
> Following pig script stores empty filter results into  'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data.
> ==============================================================================================================
> 2009-01-13 21:47:08,988 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> ==============================================================================================================
> But on a multi-node Hadoop installation, the script fails with the following error:
> ==============================================================================================================
> 2009-01-13 13:48:34,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
> java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:408)
>         at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
>         at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
>         at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
>         at org.apache.pig.Main.main(Main.java:306)
> Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
>         ... 7 more
> Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist
>         at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
>         at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
>         at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
>         ... 6 more
> ==============================================================================================================
> {code}
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> RAW_LOGS = limit RAW_LOGS 2;
> FILTERED_LOGS = filter RAW_LOGS by numvisits < 0;
> store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
> EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int);
> GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
> COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
>                              group, COUNT(EMPTY_FILTERED_LOGS);
> explain COUNT_EMPTYFILTERED_LOGS;
> dump COUNT_EMPTYFILTERED_LOGS;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.