You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xianda Ke (JIRA)" <ji...@apache.org> on 2016/06/01 04:58:12 UTC

[jira] [Resolved] (PIG-4857) Last record is missing in STREAM operator

     [ https://issues.apache.org/jira/browse/PIG-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xianda Ke resolved PIG-4857.
----------------------------
    Resolution: Fixed

already fixed in PIG-4876

> Last record is missing in STREAM operator
> -----------------------------------------
>
>                 Key: PIG-4857
>                 URL: https://issues.apache.org/jira/browse/PIG-4857
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Xianda Ke
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4857.patch
>
>
> This bug is similar to PIG-4842.
> Scenario:
> {code}
> cat input.txt
> 1
> 1
> 2
> {code}
> Pig script:
> {code}
> REGISTER myudfs.jar;
> A = LOAD 'input.txt' USING myudfs.DummyCollectableLoader() AS (id); 
> B = GROUP A by $0 USING 'collected';    -- (1, {(1),(1)}), (2,{(2)})
> C = STREAM B THROUGH ` awk '{
>      print $0;
> }'`;
> DUMP C;
> {code}
> Expected Result:
> {code}
> (1,{(1),(1)})
> (2,{(2)})
> {code}
> Actual Result:
> {code}
> (1,{(1),(1)})
> {code}
> The last record is missing...
> Root Cause:
> When the flag endOfAllInput was set as true by the predecessor,  the predecessor buffers the last record which is the input of Stream.   Then POStream find endOfAllInput is true, in fact, the last input is not consumed yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)