You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/12/22 19:28:29 UTC

[jira] Resolved: (PIG-1168) Dump produces wrong results

     [ https://issues.apache.org/jira/browse/PIG-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-1168.
---------------------------------

    Resolution: Won't Fix

This is by design. Dump is meant for interactive, not batch mode and as such is executed right away and not as part of multiquery

> Dump produces wrong results
> ---------------------------
>
>                 Key: PIG-1168
>                 URL: https://issues.apache.org/jira/browse/PIG-1168
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankur
>
> For a map-only job, dump just re-executes every pig-latin statement from the begininng assuming that they would produce same result. the assumption is not valid if there are UDFs that are invoked. Consider the following script:-
> raw = LOAD '$input' USING PigStorage() AS (text_string:chararray);
> DUMP raw;
> ccm = FOREACH raw GENERATE MyUDF(text_string);
> DUMP ccm;
> bug = FOREACH ccm GENERATE ccmObj;
> DUMP bug;
> The UDF MyUDF generates a tuple with one of the fields being a randomly generated UUID. So even though one would expect relations 'ccm' and 'bug' to contain identical data, they are different because of re-execution from the begininng. This breaks the application logic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.