You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by praveenesh kumar <pr...@gmail.com> on 2012/02/02 10:05:39 UTC

How to use tuples ?

Hi,

I am trying to learn how can I store records in tuples ?

Suppose I have a txt file

$ cat tmp.txt

1,2,3,4
2,3,4,5
4,5,5,6

I am doing this
$ pig > A = Load 'tmp.txt' using PigStorage(',') AS
(t:tuple(int:a,int:b,int:c,int:d));
$ pig > Dump A;
I am getting nothing in the output
( )
( )
( )

Can anyone help me understanding why its happening ?
Even if I don't use PigStorage nothing is coming.

Thanks,
Praveenesh

Re: How to use tuples ?

Posted by Daniel Dai <da...@hortonworks.com>.
I guess you mean to load a bag. Your input file should be:
{(1,2,3),(2,4,5)}
{(2,3,4),(2,3,5)}

And load statement should be:
z = load 'tmp.txt' as (b:{(a0:int,a1:int,a2:int)});

Daniel

On Thu, Feb 2, 2012 at 2:43 AM, praveenesh kumar <pr...@gmail.com> wrote:
> Okie so its wierd.
>
> I was able to run a pig query using $0.$0
>
> the pig script I wrote for the data (tmp.txt) :
>
> (1,2,3) (2,4,5)
> (2,3,4) (2,3,5)
>
> z = load 'tmp.txt';
> x = foreach z generate $0.$0;
> dump x;
>
> It ran fine for first time. But now its giving me error :
>
> ERROR 1066: Unable to open iterator for alias x
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias x
>        at org.apache.pig.PigServer.openIterator(PigServer.java:858)
>        at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
>        at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>        at org.apache.pig.Main.run(Main.java:523)
>        at org.apache.pig.Main.main(Main.java:148)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>        at org.apache.pig.PigServer.openIterator(PigServer.java:850)
>        ... 12 more
> =============================
>
> On Thu, Feb 2, 2012 at 3:39 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> Okie got it.Thanks for guiding.
>> Without schema. we can refer through $0.$0 or $1.$0 and so on based on the
>> positions..
>>
>> Thanks,
>> Praveenesh
>>
>> On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> One more thing, suppose I have data  - tmp.txt lie
>>> (1,2,3) (2,4,5)
>>> (2,3,4) (2,3,5)
>>>
>>> So if I will use  Z1 = Load 'tmp.txt'
>>> The data will get stored in a bag (right?)
>>>
>>> ( (1,2,3), (2,4,5) )
>>> ( (2,3,4), (2,3,5) )
>>>
>>> Now I can refer to the fields in this case ( without schema ) ?
>>>
>>> B = Foreach Z1 generate Z1.$0;
>>>
>>> This generates error. How can I do it correctly ?
>>>
>>> Thanks,
>>> Praveenesh
>>>
>>> And if so, how can I refer the variables inside ?
>>>
>>> Thanks,
>>> Praveenesh
>>>
>>>
>>> On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>>
>>>> thanks Daniel,
>>>> so it means for all other complex datatypes, we need the file contents
>>>> to be in that format
>>>> like tuples in ( ), bag in { } , map in [ ]
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <da...@hortonworks.com>wrote:
>>>>
>>>>> Hi, Praveenesh,
>>>>> Your tmp.txt should be:
>>>>> (1,2,3,4)
>>>>> (2,3,4,5)
>>>>> (4,5,5,6)
>>>>>
>>>>> And you cannot use "," as a delimit for PigStorage, otherwise,
>>>>> PigStorage will split the line with comma first then parse the tuple.
>>>>>
>>>>> Daniel
>>>>>
>>>>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am trying to learn how can I store records in tuples ?
>>>>> >
>>>>> > Suppose I have a txt file
>>>>> >
>>>>> > $ cat tmp.txt
>>>>> >
>>>>> > 1,2,3,4
>>>>> > 2,3,4,5
>>>>> > 4,5,5,6
>>>>> >
>>>>> > I am doing this
>>>>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
>>>>> > (t:tuple(int:a,int:b,int:c,int:d));
>>>>> > $ pig > Dump A;
>>>>> > I am getting nothing in the output
>>>>> > ( )
>>>>> > ( )
>>>>> > ( )
>>>>> >
>>>>> > Can anyone help me understanding why its happening ?
>>>>> > Even if I don't use PigStorage nothing is coming.
>>>>> >
>>>>> > Thanks,
>>>>> > Praveenesh
>>>>>
>>>>
>>>>
>>>
>>

Re: How to use tuples ?

Posted by praveenesh kumar <pr...@gmail.com>.
Okie so its wierd.

I was able to run a pig query using $0.$0

the pig script I wrote for the data (tmp.txt) :

(1,2,3) (2,4,5)
(2,3,4) (2,3,5)

z = load 'tmp.txt';
x = foreach z generate $0.$0;
dump x;

It ran fine for first time. But now its giving me error :

ERROR 1066: Unable to open iterator for alias x

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias x
        at org.apache.pig.PigServer.openIterator(PigServer.java:858)
        at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:523)
        at org.apache.pig.Main.main(Main.java:148)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
        at org.apache.pig.PigServer.openIterator(PigServer.java:850)
        ... 12 more
=============================

On Thu, Feb 2, 2012 at 3:39 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Okie got it.Thanks for guiding.
> Without schema. we can refer through $0.$0 or $1.$0 and so on based on the
> positions..
>
> Thanks,
> Praveenesh
>
> On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> One more thing, suppose I have data  - tmp.txt lie
>> (1,2,3) (2,4,5)
>> (2,3,4) (2,3,5)
>>
>> So if I will use  Z1 = Load 'tmp.txt'
>> The data will get stored in a bag (right?)
>>
>> ( (1,2,3), (2,4,5) )
>> ( (2,3,4), (2,3,5) )
>>
>> Now I can refer to the fields in this case ( without schema ) ?
>>
>> B = Foreach Z1 generate Z1.$0;
>>
>> This generates error. How can I do it correctly ?
>>
>> Thanks,
>> Praveenesh
>>
>> And if so, how can I refer the variables inside ?
>>
>> Thanks,
>> Praveenesh
>>
>>
>> On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <pr...@gmail.com>wrote:
>>
>>> thanks Daniel,
>>> so it means for all other complex datatypes, we need the file contents
>>> to be in that format
>>> like tuples in ( ), bag in { } , map in [ ]
>>>
>>>
>>>
>>>
>>> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <da...@hortonworks.com>wrote:
>>>
>>>> Hi, Praveenesh,
>>>> Your tmp.txt should be:
>>>> (1,2,3,4)
>>>> (2,3,4,5)
>>>> (4,5,5,6)
>>>>
>>>> And you cannot use "," as a delimit for PigStorage, otherwise,
>>>> PigStorage will split the line with comma first then parse the tuple.
>>>>
>>>> Daniel
>>>>
>>>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am trying to learn how can I store records in tuples ?
>>>> >
>>>> > Suppose I have a txt file
>>>> >
>>>> > $ cat tmp.txt
>>>> >
>>>> > 1,2,3,4
>>>> > 2,3,4,5
>>>> > 4,5,5,6
>>>> >
>>>> > I am doing this
>>>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
>>>> > (t:tuple(int:a,int:b,int:c,int:d));
>>>> > $ pig > Dump A;
>>>> > I am getting nothing in the output
>>>> > ( )
>>>> > ( )
>>>> > ( )
>>>> >
>>>> > Can anyone help me understanding why its happening ?
>>>> > Even if I don't use PigStorage nothing is coming.
>>>> >
>>>> > Thanks,
>>>> > Praveenesh
>>>>
>>>
>>>
>>
>

Re: How to use tuples ?

Posted by praveenesh kumar <pr...@gmail.com>.
Okie got it.Thanks for guiding.
Without schema. we can refer through $0.$0 or $1.$0 and so on based on the
positions..

Thanks,
Praveenesh

On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar <pr...@gmail.com>wrote:

> One more thing, suppose I have data  - tmp.txt lie
> (1,2,3) (2,4,5)
> (2,3,4) (2,3,5)
>
> So if I will use  Z1 = Load 'tmp.txt'
> The data will get stored in a bag (right?)
>
> ( (1,2,3), (2,4,5) )
> ( (2,3,4), (2,3,5) )
>
> Now I can refer to the fields in this case ( without schema ) ?
>
> B = Foreach Z1 generate Z1.$0;
>
> This generates error. How can I do it correctly ?
>
> Thanks,
> Praveenesh
>
> And if so, how can I refer the variables inside ?
>
> Thanks,
> Praveenesh
>
>
> On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <pr...@gmail.com>wrote:
>
>> thanks Daniel,
>> so it means for all other complex datatypes, we need the file contents to
>> be in that format
>> like tuples in ( ), bag in { } , map in [ ]
>>
>>
>>
>>
>> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <da...@hortonworks.com> wrote:
>>
>>> Hi, Praveenesh,
>>> Your tmp.txt should be:
>>> (1,2,3,4)
>>> (2,3,4,5)
>>> (4,5,5,6)
>>>
>>> And you cannot use "," as a delimit for PigStorage, otherwise,
>>> PigStorage will split the line with comma first then parse the tuple.
>>>
>>> Daniel
>>>
>>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am trying to learn how can I store records in tuples ?
>>> >
>>> > Suppose I have a txt file
>>> >
>>> > $ cat tmp.txt
>>> >
>>> > 1,2,3,4
>>> > 2,3,4,5
>>> > 4,5,5,6
>>> >
>>> > I am doing this
>>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
>>> > (t:tuple(int:a,int:b,int:c,int:d));
>>> > $ pig > Dump A;
>>> > I am getting nothing in the output
>>> > ( )
>>> > ( )
>>> > ( )
>>> >
>>> > Can anyone help me understanding why its happening ?
>>> > Even if I don't use PigStorage nothing is coming.
>>> >
>>> > Thanks,
>>> > Praveenesh
>>>
>>
>>
>

Re: How to use tuples ?

Posted by praveenesh kumar <pr...@gmail.com>.
One more thing, suppose I have data  - tmp.txt like
(1,2,3) (2,4,5)
(2,3,4) (2,3,5)

So if I will use  Z1 = Load 'tmp.txt'
The data will get stored in a bag (right?)

( (1,2,3), (2,4,5) )
( (2,3,4), (2,3,5) )

Now I can refer to the fields in this case ( without schema ) ?

B = Foreach Z1 generate Z1.$0;

This generates error. How can I do it correctly ?

Thanks,
Praveenesh

And if so, how can I refer the variables inside ?

Thanks,
Praveenesh

On Thu, Feb 2, 2012 at 3:10 PM, praveenesh kumar <pr...@gmail.com>wrote:

> thanks Daniel,
> so it means for all other complex datatypes, we need the file contents to
> be in that format
> like tuples in ( ), bag in { } , map in [ ]
>
>
>
>
> On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
>> Hi, Praveenesh,
>> Your tmp.txt should be:
>> (1,2,3,4)
>> (2,3,4,5)
>> (4,5,5,6)
>>
>> And you cannot use "," as a delimit for PigStorage, otherwise,
>> PigStorage will split the line with comma first then parse the tuple.
>>
>> Daniel
>>
>> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am trying to learn how can I store records in tuples ?
>> >
>> > Suppose I have a txt file
>> >
>> > $ cat tmp.txt
>> >
>> > 1,2,3,4
>> > 2,3,4,5
>> > 4,5,5,6
>> >
>> > I am doing this
>> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
>> > (t:tuple(int:a,int:b,int:c,int:d));
>> > $ pig > Dump A;
>> > I am getting nothing in the output
>> > ( )
>> > ( )
>> > ( )
>> >
>> > Can anyone help me understanding why its happening ?
>> > Even if I don't use PigStorage nothing is coming.
>> >
>> > Thanks,
>> > Praveenesh
>>
>
>

Re: How to use tuples ?

Posted by praveenesh kumar <pr...@gmail.com>.
thanks Daniel,
so it means for all other complex datatypes, we need the file contents to
be in that format
like tuples in ( ), bag in { } , map in [ ]



On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai <da...@hortonworks.com> wrote:

> Hi, Praveenesh,
> Your tmp.txt should be:
> (1,2,3,4)
> (2,3,4,5)
> (4,5,5,6)
>
> And you cannot use "," as a delimit for PigStorage, otherwise,
> PigStorage will split the line with comma first then parse the tuple.
>
> Daniel
>
> On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com>
> wrote:
> > Hi,
> >
> > I am trying to learn how can I store records in tuples ?
> >
> > Suppose I have a txt file
> >
> > $ cat tmp.txt
> >
> > 1,2,3,4
> > 2,3,4,5
> > 4,5,5,6
> >
> > I am doing this
> > $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
> > (t:tuple(int:a,int:b,int:c,int:d));
> > $ pig > Dump A;
> > I am getting nothing in the output
> > ( )
> > ( )
> > ( )
> >
> > Can anyone help me understanding why its happening ?
> > Even if I don't use PigStorage nothing is coming.
> >
> > Thanks,
> > Praveenesh
>

Re: How to use tuples ?

Posted by Daniel Dai <da...@hortonworks.com>.
Hi, Praveenesh,
Your tmp.txt should be:
(1,2,3,4)
(2,3,4,5)
(4,5,5,6)

And you cannot use "," as a delimit for PigStorage, otherwise,
PigStorage will split the line with comma first then parse the tuple.

Daniel

On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar <pr...@gmail.com> wrote:
> Hi,
>
> I am trying to learn how can I store records in tuples ?
>
> Suppose I have a txt file
>
> $ cat tmp.txt
>
> 1,2,3,4
> 2,3,4,5
> 4,5,5,6
>
> I am doing this
> $ pig > A = Load 'tmp.txt' using PigStorage(',') AS
> (t:tuple(int:a,int:b,int:c,int:d));
> $ pig > Dump A;
> I am getting nothing in the output
> ( )
> ( )
> ( )
>
> Can anyone help me understanding why its happening ?
> Even if I don't use PigStorage nothing is coming.
>
> Thanks,
> Praveenesh