You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eyal Allweil <ey...@yahoo.com.INVALID> on 2016/07/04 14:05:45 UTC

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

I can replicate these results on Pig 0.14.
Did anyone open a Jira issue for this?
 

    On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan <ss...@bol.com> wrote:
 

 Hi All,

I have a script which stores 2 relations with different schema using CSVExcelStorage.

The issue which i see is that the script picks up the last store function and takes the schema in that and puts it for all store functions , overriding the previous store schemas.Is this a known issue and is there a fix for this ?

My Sample Script Looks like this :--

=============================================================

masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
                    'f:a,f:b,f:c,f:d')
          as (a,b,c,d);

input2 = foreach masterInput
                  generate
                        a,b;

input3 = foreach masterInput
                  generate
                      c,d;

store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

=============================================================

Expected Output :

file 1        file 2

a,b            c,d
10,20          30,40


Actual Output :

file 1        file 2
c,d            c,d
10,20          30,40

Thanks and Regards,

Sarath Sasidharan


  

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Can you try in Pig 0.16? Niels fixed this in
https://issues.apache.org/jira/browse/PIG-4689

On Mon, Jul 4, 2016 at 7:05 AM, Eyal Allweil <eyal_allweil@yahoo.com.invalid
> wrote:

> I can replicate these results on Pig 0.14.
> Did anyone open a Jira issue for this?
>
>
>     On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan <
> ssasidharan@bol.com> wrote:
>
>
>  Hi All,
>
> I have a script which stores 2 relations with different schema using
> CSVExcelStorage.
>
> The issue which i see is that the script picks up the last store function
> and takes the schema in that and puts it for all store functions ,
> overriding the previous store schemas.Is this a known issue and is there a
> fix for this ?
>
> My Sample Script Looks like this :--
>
> =============================================================
>
> masterInput = load 'hbase://xyz' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>                     'f:a,f:b,f:c,f:d')
>           as (a,b,c,d);
>
> input2 = foreach masterInput
>                   generate
>                         a,b;
>
> input3 = foreach masterInput
>                   generate
>                       c,d;
>
> store input2 into '/dir/ab'
> using
> org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
> 'UNIX', 'WRITE_OUTPUT_HEADER');
>
> store input3 into '/dir/cd'
> using
> org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE',
> 'UNIX', 'WRITE_OUTPUT_HEADER');
>
> =============================================================
>
> Expected Output :
>
> file 1        file 2
>
> a,b            c,d
> 10,20          30,40
>
>
> Actual Output :
>
> file 1        file 2
> c,d            c,d
> 10,20          30,40
>
> Thanks and Regards,
>
> Sarath Sasidharan
>
>
>

Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

Posted by Sarath Sasidharan <ss...@bol.com>.
Hi Eyal,

1.    I have created a ticket : PIG-4943<https://issues.apache.org/jira/browse/PIG-4943>


Thanks and Regards,

Sarath

From: Eyal Allweil <ey...@yahoo.com>
Reply-To: Eyal Allweil <ey...@yahoo.com>
Date: Monday 4 July 2016 at 16:05
To: "user@pig.apache.org" <us...@pig.apache.org>, Sarath Sasidharan <ss...@bol.com>
Subject: Re: Schema issue while storing multiple pig outputs using CSVExcelStorage

I can replicate these results on Pig 0.14.

Did anyone open a Jira issue for this?

On Thursday, March 10, 2016 12:24 PM, Sarath Sasidharan <ss...@bol.com> wrote:

Hi All,

I have a script which stores 2 relations with different schema using CSVExcelStorage.

The issue which i see is that the script picks up the last store function and takes the schema in that and puts it for all store functions , overriding the previous store schemas.Is this a known issue and is there a fix for this ?

My Sample Script Looks like this :--

=============================================================

masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
                    'f:a,f:b,f:c,f:d')
          as (a,b,c,d);

input2 = foreach masterInput
                  generate
                        a,b;

input3 = foreach masterInput
                  generate
                      c,d;

store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

=============================================================

Expected Output :

file 1        file 2

a,b            c,d
10,20          30,40


Actual Output :

file 1        file 2
c,d            c,d
10,20          30,40

Thanks and Regards,

Sarath Sasidharan