You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Saumitra Shahapure <sa...@gmail.com> on 2011/06/14 21:46:36 UTC

LOAD clause with Bag

Hello,

When we have LOAD clause with Bag as its member, what type of input file
structure is expected? Can default PigStorage() function handle that?
 e.g. in A = LOAD 'data.txt' AS (B: bag {T: tuple(t1:int, t2:int,
t3:int)});
What structure of data.txt is expected? Is it possible to write StoreFunc in
this case?

Also if we have multidimensional data (like 2D n*n matrix, n varies with
input), can we expect Bag which contains each row as nested Bag?
-- 
Saumitra S. Shahapure

Re: LOAD clause with Bag

Posted by Daniel Dai <ji...@yahoo-inc.com>.
I opened a Jira https://issues.apache.org/jira/browse/PIG-2126 to 
address it.

Daniel

On 06/14/2011 03:52 PM, Russell Jurney wrote:
> Is this in the docs yet?  I've been asked this question a dozen times.
>
> Russ
>
> On Tue, Jun 14, 2011 at 3:02 PM, Daniel Dai<ji...@yahoo-inc.com>  wrote:
>
>> data.txt can be:
>> ({(1,2,3),(4,5,6)})
>> ({(7,8,9),(10,11,12)}
>>
>> Also bag can be nested, eg:
>> A = LOAD 'data.txt' AS (B: bag {t:tuple(BB: bag {tt:tuple(t1:int, t2:int,
>> t3:int)})});
>>
>> data.txt:
>> {({(1,2,3),(4,5,6)}),({(7,8,9),(10,11,12)})}
>>
>> Daniel
>>
>>
>> On 06/14/2011 12:46 PM, Saumitra Shahapure wrote:
>>
>>> Hello,
>>>
>>> When we have LOAD clause with Bag as its member, what type of input file
>>> structure is expected? Can default PigStorage() function handle that?
>>>   e.g. in A = LOAD 'data.txt' AS (B: bag {T: tuple(t1:int, t2:int,
>>> t3:int)});
>>> What structure of data.txt is expected? Is it possible to write StoreFunc
>>> in
>>> this case?
>>>
>>> Also if we have multidimensional data (like 2D n*n matrix, n varies with
>>> input), can we expect Bag which contains each row as nested Bag?
>>>
>>


Re: LOAD clause with Bag

Posted by Russell Jurney <ru...@gmail.com>.
Is this in the docs yet?  I've been asked this question a dozen times.

Russ

On Tue, Jun 14, 2011 at 3:02 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:

> data.txt can be:
> ({(1,2,3),(4,5,6)})
> ({(7,8,9),(10,11,12)}
>
> Also bag can be nested, eg:
> A = LOAD 'data.txt' AS (B: bag {t:tuple(BB: bag {tt:tuple(t1:int, t2:int,
> t3:int)})});
>
> data.txt:
> {({(1,2,3),(4,5,6)}),({(7,8,9),(10,11,12)})}
>
> Daniel
>
>
> On 06/14/2011 12:46 PM, Saumitra Shahapure wrote:
>
>> Hello,
>>
>> When we have LOAD clause with Bag as its member, what type of input file
>> structure is expected? Can default PigStorage() function handle that?
>>  e.g. in A = LOAD 'data.txt' AS (B: bag {T: tuple(t1:int, t2:int,
>> t3:int)});
>> What structure of data.txt is expected? Is it possible to write StoreFunc
>> in
>> this case?
>>
>> Also if we have multidimensional data (like 2D n*n matrix, n varies with
>> input), can we expect Bag which contains each row as nested Bag?
>>
>
>

Re: LOAD clause with Bag

Posted by Daniel Dai <ji...@yahoo-inc.com>.
data.txt can be:
({(1,2,3),(4,5,6)})
({(7,8,9),(10,11,12)}

Also bag can be nested, eg:
A = LOAD 'data.txt' AS (B: bag {t:tuple(BB: bag {tt:tuple(t1:int, 
t2:int, t3:int)})});

data.txt:
{({(1,2,3),(4,5,6)}),({(7,8,9),(10,11,12)})}

Daniel

On 06/14/2011 12:46 PM, Saumitra Shahapure wrote:
> Hello,
>
> When we have LOAD clause with Bag as its member, what type of input file
> structure is expected? Can default PigStorage() function handle that?
>   e.g. in A = LOAD 'data.txt' AS (B: bag {T: tuple(t1:int, t2:int,
> t3:int)});
> What structure of data.txt is expected? Is it possible to write StoreFunc in
> this case?
>
> Also if we have multidimensional data (like 2D n*n matrix, n varies with
> input), can we expect Bag which contains each row as nested Bag?