You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Charles (Updated) (JIRA)" <ji...@apache.org> on 2011/11/15 19:20:51 UTC

[jira] [Updated] (PIG-2368) Penny doesn't store the output file correctly

     [ https://issues.apache.org/jira/browse/PIG-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Charles updated PIG-2368:
--------------------------------

    Description: 
When executed with penny, a pig script either has no ouptut or fails to filter data.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1

script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';

PENNY:
Command Line:
java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:

Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"

Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------

Using the same environment and running pig without penny 
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"

Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================

A similar script for the passwd file makes penny incorrectly filter the output data
a = load '/home/hadoop/passwd' using PigStorage(':');
b = filter a by $3 >200;
store b into '/home/hadoop/passwdout';

Penny
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 68 records (3513 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 68 [OUTPUT FILE WASN'T FILTERED CORRECTLY]
Total bytes written : 3513
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

---------------
Pig
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 46 records (2555 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 46
Total bytes written : 2555
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

similar issue can be seen for the -nop- application too.


  was:
When executed with penny, a pig script either has no ouptut or fails to filter data when the files are in HDFS.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1

script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';

PENNY:
Command Line:
java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:

Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"

Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------

Using the same environment and running pig without penny 
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"

Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================

A similar script for the passwd file makes penny incorrectly filter the output data
a = load '/home/hadoop/passwd' using PigStorage(':');
b = filter a by $3 >200;
store b into '/home/hadoop/passwdout';

Penny
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 68 records (3513 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 68 [OUTPUT FILE WASN'T FILTERED CORRECTLY]
Total bytes written : 3513
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

---------------
Pig
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 46 records (2555 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 46
Total bytes written : 2555
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

similar issue can be seen for the -nop- application too.


    
> Penny doesn't store the output file correctly
> ---------------------------------------------
>
>                 Key: PIG-2368
>                 URL: https://issues.apache.org/jira/browse/PIG-2368
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>         Environment: Testing in a cluster with 6 nodes
> Debian linux 64bit, 
> java 1.6, 
> Hadoop 0.20.2
>            Reporter: Daniel Charles
>
> When executed with penny, a pig script either has no ouptut or fails to filter data.
> For reference:
> copy a file with the content below to /home/hadoop/tablea.
> 0;4;2
> 1;3;3
> 2;2;0
> 3;1;4
> 4;0;1
> script.pig content:
> a = load '/home/hadoop/tablea' using PigStorage(';');
> b = filter a by $2 < 1000;
> store b into '/home/hadoop/tablea.out';
> PENNY:
> Command Line:
> java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 0 records in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 0 [OUTPUT FILE IS EMPTY]
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> 11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
> ----------------------
> Using the same environment and running pig without penny 
> Command Line:
> pig script.pig
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 5
> Total bytes written : 30
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> ================
> A similar script for the passwd file makes penny incorrectly filter the output data
> a = load '/home/hadoop/passwd' using PigStorage(':');
> b = filter a by $3 >200;
> store b into '/home/hadoop/passwdout';
> Penny
> Input(s):
> Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"
> Output(s):
> Successfully stored 68 records (3513 bytes) in: "/home/hadoop/passwdout"
> Counters:
> Total records written : 68 [OUTPUT FILE WASN'T FILTERED CORRECTLY]
> Total bytes written : 3513
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> ---------------
> Pig
> Input(s):
> Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"
> Output(s):
> Successfully stored 46 records (2555 bytes) in: "/home/hadoop/passwdout"
> Counters:
> Total records written : 46
> Total bytes written : 2555
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> similar issue can be seen for the -nop- application too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira