You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2008/05/07 10:29:55 UTC
[jira] Issue Comment Edited: (PIG-232) Number of output rows in the
log seems to be invalid
[ https://issues.apache.org/jira/browse/PIG-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594825#action_12594825 ]
acmurthy edited comment on PIG-232 at 5/7/08 1:29 AM:
-----------------------------------------------------------
Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization?
/pig/studenttab10k has 10,000 records.
Now:
{noformat}
IP = load '/pig/studenttab10k';
OP = stream IP through `perl -ne 'print $_;'`;
store OP into '/pig/out' using PigStorage(',');
{noformat}
correctly shows 10,000 as the no. of output-records while:
{noformat}
IP = load '/pig/studenttab10k';
OP = stream IP through `perl -ne 'print $_;'`;
store OP into '/pig/out';
{noformat}
shows the no. of output-records as 4 due to the stream/store optimization.
Could you please re-check? Thanks!
was (Author: acmurthy):
Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization?
/pig/studenttab10k has 10,000 records.
Now:
{noformat}
define CMD `script.pl` ship('../pig/scripts/script.pl');
IP = load '/pig/studenttab10k';
OP = stream IP through CMD;
store OP into '/pig/out' using PigStorage(',');
{noformat}
correctly shows 10,000 as the no. of output-records while:
{noformat}
define CMD `script.pl` ship('../pig/scripts/script.pl');
IP = load '/pig/studenttab10k';
OP = stream IP through CMD;
store OP into '/pig/out';
{noformat}
shows the no. of output-records as 4 due to the stream/store optimization.
Could you please re-check? Thanks!
> Number of output rows in the log seems to be invalid
> -----------------------------------------------------
>
> Key: PIG-232
> URL: https://issues.apache.org/jira/browse/PIG-232
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Arun C Murthy
>
> My pig script:
> define CMD `perl PigStreamingBad.pl end` ship('PigStreamingBad.pl') stderr('CMD' limit 1);
> A = load 'studenttab10k';
> B = stream A through CMD;
> store B into 'out';
> My perl script:
> use strict;
> # This script is used to test streaming error cases in pig.
> # Usage: PigStreaming.pl <start|middle|end>
> # the parameter tells the application when to exit with error
> if ($#ARGV < 0)
> {
> print STDERR "Usage PigStreaming.pl <start|middle|end>\n";
> exit (-1);
> }
> my $pos = $ARGV[0];
> if ($pos eq "start")
> {
> print STDERR "Failed in the beginning of the processing\n";
> exit(1);
> }
> print STDERR "PigStreamingBad.pl: starting processing\n";
> my $cnt = 0;
> while (<STDIN>)
> {
> print "$_";
> $cnt++;
> print STDERR "PigStreaming.pl: processing $_\n";
> if (($cnt > 100) && ($pos eq "middle"))
> {
> print STDERR "Failed in the middle of processing\n";
> exit(2);
> }
> }
> print STDERR "Failed at the end of processing\n";
> exit(3);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.