You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kevin Weil <ke...@gmail.com> on 2008/10/08 21:00:16 UTC

BinStorage and versioning

Hi,

Say that I write out a tuple with three fields using BinStorage.  And then
in a couple months, I add a parameter, so now I write out a tuple with a new
fourth field.  If I'm loading a directory containing the files that have
both of these tuples, and I say

a = LOAD 'my_directory' using BinStorage() as (f1, f2, f3, f4)

what will happen when BinStorage tries to load the first tuple with only
three fields?  Will f4 just be NULL?

Thanks,
Kevin

Re: BinStorage and versioning

Posted by Kevin Weil <ke...@gmail.com>.
Just tested -- it does not work in the release, but in the types branch it
does:
Thanks!

> cat f3.txt
a b c

> cat f4.txt
A B C D

a = load 'f3.txt' using PigStorage(' ') as (f1, f2, f3);
b = load 'f4.txt' using PigStorage(' ') as (F1, F2, F3, F4);
both = union a, b;
store both into 'fields.bz2' using BinStorage();

c = load 'fields.bz2' using BinStorage() as (p1, p2, p3, p4);
dump c;
(a, b, c)
(A, B, C, D)

k = foreach c generate p4;
dump k;
(D)
(NULL)



On Wed, Oct 8, 2008 at 12:59 PM, Olga Natkovich <ol...@yahoo-inc.com> wrote:

> It should work just fine; the missing data will be replaced with null at
> least in the types branch.
>
> Olga
>
> > -----Original Message-----
> > From: Kevin Weil [mailto:kevinweil@gmail.com]
> > Sent: Wednesday, October 08, 2008 12:00 PM
> > To: pig-user@incubator.apache.org
> > Subject: BinStorage and versioning
> >
> > Hi,
> >
> > Say that I write out a tuple with three fields using
> > BinStorage.  And then in a couple months, I add a parameter,
> > so now I write out a tuple with a new fourth field.  If I'm
> > loading a directory containing the files that have both of
> > these tuples, and I say
> >
> > a = LOAD 'my_directory' using BinStorage() as (f1, f2, f3, f4)
> >
> > what will happen when BinStorage tries to load the first
> > tuple with only three fields?  Will f4 just be NULL?
> >
> > Thanks,
> > Kevin
> >
>

RE: BinStorage and versioning

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
It should work just fine; the missing data will be replaced with null at
least in the types branch.

Olga 

> -----Original Message-----
> From: Kevin Weil [mailto:kevinweil@gmail.com] 
> Sent: Wednesday, October 08, 2008 12:00 PM
> To: pig-user@incubator.apache.org
> Subject: BinStorage and versioning
> 
> Hi,
> 
> Say that I write out a tuple with three fields using 
> BinStorage.  And then in a couple months, I add a parameter, 
> so now I write out a tuple with a new fourth field.  If I'm 
> loading a directory containing the files that have both of 
> these tuples, and I say
> 
> a = LOAD 'my_directory' using BinStorage() as (f1, f2, f3, f4)
> 
> what will happen when BinStorage tries to load the first 
> tuple with only three fields?  Will f4 just be NULL?
> 
> Thanks,
> Kevin
>