You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Charles (Updated) (JIRA)" <ji...@apache.org> on 2011/11/16 15:17:51 UTC
[jira] [Updated] (PIG-2368) Penny doesn't recognize the schema properly

     [ https://issues.apache.org/jira/browse/PIG-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Charles updated PIG-2368:
--------------------------------

    Description: 
When executed with penny a script that has no schema especified doesnt convert the values to proper type and therefore mess up with he result.

For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1

script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';

PENNY:
Command Line:
java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:

Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"

Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------

Using the same environment and running pig without penny 
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"

Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================

It happens because when the plan is rebuilt in penny, it fails to proper add the type of the column which is recognized as bytearray instead of integer.


similar issue can be seen for the -nop- application too.


  was:
When executed with penny, a pig script either has no ouptut or fails to filter data.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1

script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';

PENNY:
Command Line:
java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:

Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"

Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------

Using the same environment and running pig without penny 
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"

Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================

A similar script for the passwd file makes penny incorrectly filter the output data
a = load '/home/hadoop/passwd' using PigStorage(':');
b = filter a by $3 >200;
store b into '/home/hadoop/passwdout';

Penny
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 68 records (3513 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 68 [OUTPUT FILE WASN'T FILTERED CORRECTLY]
Total bytes written : 3513
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

---------------
Pig
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 46 records (2555 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 46
Total bytes written : 2555
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

similar issue can be seen for the -nop- application too.


       Priority: Minor  (was: Major)
        Summary: Penny doesn't recognize the schema properly  (was: Penny doesn't store the output file correctly)
    
> Penny doesn't recognize the schema properly
> -------------------------------------------
>
>                 Key: PIG-2368
>                 URL: https://issues.apache.org/jira/browse/PIG-2368
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>         Environment: Testing in a cluster with 6 nodes
> Debian linux 64bit, 
> java 1.6, 
> Hadoop 0.20.2
>            Reporter: Daniel Charles
>            Priority: Minor
>
> When executed with penny a script that has no schema especified doesnt convert the values to proper type and therefore mess up with he result.
> For reference:
> copy a file with the content below to /home/hadoop/tablea.
> 0;4;2
> 1;3;3
> 2;2;0
> 3;1;4
> 4;0;1
> script.pig content:
> a = load '/home/hadoop/tablea' using PigStorage(';');
> b = filter a by $2 < 1000;
> store b into '/home/hadoop/tablea.out';
> PENNY:
> Command Line:
> java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar  org.apache.pig.penny.apps.ri.Main script.pig b 2
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 0 records in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 0 [OUTPUT FILE IS EMPTY]
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> 11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
> ----------------------
> Using the same environment and running pig without penny 
> Command Line:
> pig script.pig
> Output summary:
> Input(s):
> Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"
> Output(s):
> Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"
> Counters:
> Total records written : 5
> Total bytes written : 30
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> ================
> It happens because when the plan is rebuilt in penny, it fails to proper add the type of the column which is recognized as bytearray instead of integer.
> similar issue can be seen for the -nop- application too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira