You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by michael <mi...@gmail.com> on 2013/11/22 02:03:11 UTC

Script produces incorrect output unless ColumnMapKeyPrune optimisation is disabled.

Hello,

My script it gives incorrect output when ColumnMapKeyPrune optimisation is
enabled (as it is by default).

"pig -x local myscript.pig" produces incorrect output in output/e.csv.

However "pig -x local -t ColumnMapKeyPrune myscript.pig" works correctly.

I checked the bug list but couldn't find anything related.

Am I doing something wrong?  Is this a known issue or should I raise a new
bug report?

I am running Apache Pig 0.11.1 on Linux.

Regards,
Michael


To reproduce here are the script and data file contents:
---------------------------------------------------------------------------------------
myscript.pig

register /usr/local/pig/contrib/piggybank/java/piggybank.jar;
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader;
define CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage;

a1 = load 'test1.csv' using CSVExcelStorage(',') as (
    A:chararray,
    B:chararray,
    C:int,
    D:chararray,
    E:chararray,
    F:chararray,
    G:int,
    H:chararray);

split a1 into
    a2 if B != '' and F != '',
    e0 otherwise;

e1 = foreach e0 generate A, B, F, H, D, E, G;
x1 = foreach a2 generate A, G;

a4 = load 'test2.csv' using CSVExcelStorage(',') as (A:chararray);
x2 = foreach a4 generate A;

x3 = join x1 by A left, x2 by A;

STORE e1 into './output/e' USING PigStorage(',', '-schema');
STORE x3 INTO './output/x' USING PigStorage(',', '-schema');

fs -rm output/e/.pig_schema;
fs -getmerge output/e output/e.csv;

---------------------------------------------------------------------------------------
test1.csv

a,x,1,,,x
b,x,1,,,x
c,x,1,,,x
d,x,1,,,x
e,x,1,,,x
f,x,1,,,x
g,x,1,,,x

---------------------------------------------------------------------------------------
test2.csv

a
b
c
d
e