You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kelvin Moss <km...@yahoo.com> on 2009/11/05 11:23:54 UTC
Accessing fields in Tuple
Hi all,
I have the follwoing data file
(1L,2L,3L)
(4L,2L,1L)
(8L,3L,4L)
I am trying to write a UDF (like sum) that would add the fields in Tuple. This works --
public class SumAll extends EvalFunc<Long> {
public Long exec(Tuple input) {
try {
return sum(input);
} catch (NumberFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return 0L;
}
static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
long sum = 0;
List<Object> values = input.getAll();
for (Iterator<Object> it = values.iterator(); it.hasNext();) {
Tuple t = (Tuple)it.next();
sum += (Long)t.get(0);
sum += (Long)t.get(1);
sum += (Long)t.get(2);
}
return sum;
}
}
grunt> A = LOAD 'data2' as aa:bytearray;
grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
grunt> dump C;
2009-11-05 10:07:09,266 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: "file:/tmp/temp1206478472/tmp-577036369"
2009-11-05 10:07:09,267 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 3
2009-11-05 10:07:09,267 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2009-11-05 10:07:09,267 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-11-05 10:07:09,267 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(6L)
(7L)
(15L)
grunt>
Initially I thought that such a loop would work
static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
long sum = 0;
List<Object> values = input.getAll(); // Would give all fields in Tuple??
for (Iterator<Object> it = values.iterator(); it.hasNext();) {
sum += (Long)t;
}
return sum;
}
But I get an error that Tuple can't be cast back to Long. So my question is that what is input.getAll() returning? What is the structure of data that gets passed to exec function?
Thanks!
Re: Accessing fields in Tuple
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi Kevin,
With tuple's and bag's, you can have arbitrary levels of
nesting/composition.
That is, a tuple can contain other tuples/bag, and the tuples within a
bag can contain other tuples/bags.
As Thejas explained - the input to a udf is always a tuple - so whatever
parameter you passed in - would be wrapped in a tuple and sent across.
You probably want to just use :
myUdf($0, $1, $2) and so on, instead of forcing input to be within
another tuple.
Hope this helps.
Regards,
Mridul
Kelvin Moss wrote:
>
> Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple. That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?
>
> ((1,2,3),(4,5,6))
>
> It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things?
>
> Thanks!
>
> --- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:
>
>
> From: Jeff Zhang <zj...@gmail.com>
> Subject: Re: Accessing fields in Tuple
> To: pig-user@hadoop.apache.org
> Date: Thursday, November 5, 2009, 7:44 PM
>
>
> The input is the arguments you provide to your UDF. It is tuple type. Tuple
> can have more than more than one element. That means your UDF can have more
> than one argument. Here you provide one argument which is tuple type to
> your UDF.
> So that means the first element of input is a tuple.
>
>
> Jeff Zhang
>
>
> On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
>
>> Hi all,
>>
>> I have the follwoing data file
>>
>> (1L,2L,3L)
>> (4L,2L,1L)
>> (8L,3L,4L)
>>
>> I am trying to write a UDF (like sum) that would add the fields in Tuple.
>> This works --
>>
>> public class SumAll extends EvalFunc<Long> {
>> public Long exec(Tuple input) {
>> try {
>> return sum(input);
>> } catch (NumberFormatException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> } catch (ExecException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return 0L;
>> }
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll();
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> Tuple t = (Tuple)it.next();
>> sum += (Long)t.get(0);
>> sum += (Long)t.get(1);
>> sum += (Long)t.get(2);
>> }
>> return sum;
>> }
>>
>> }
>>
>> grunt> A = LOAD 'data2' as aa:bytearray;
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
>> grunt> dump C;
>> 2009-11-05 10:07:09,266 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
>> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
>> written : 3
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
>> written : 0
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
>> (6L)
>> (7L)
>> (15L)
>> grunt>
>>
>> Initially I thought that such a loop would work
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll(); // Would give all fields in Tuple??
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> sum += (Long)t;
>> }
>> return sum;
>> }
>>
>> But I get an error that Tuple can't be cast back to Long. So my question is
>> that what is input.getAll() returning? What is the structure of data that
>> gets passed to exec function?
>>
>> Thanks!
>>
>>
>>
>
>
>
>
Re: Accessing fields in Tuple
Posted by Thejas Nair <te...@yahoo-inc.com>.
Hi Kevin,
The inputs parameters to the udf are wrapped inside a tuple and then given
as input to the execu function in the udf.
In case of -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
The exec function gets a Tuple with one column which is a
tuple(long,long,long)
ie in exec(Tuple input), input.get(0) will return tuple(long,long,long) .
On the other hand if you called the udf this way -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((long)a1,(chararray)a2);
in exec(Tuple input), input.get(0) will return long, input.get(1) will
return chararray.
I hope this answers you question.
Thanks,
Thejas
On 11/5/09 9:15 PM, "Kelvin Moss" <km...@yahoo.com> wrote:
>
> Thanks for the reply. I understand that Tuple can have more than one field.
> That is why I was expecting Tuple.getAll to return me all the fields in the
> Tuple. But as it turns out it returns a Tuple. That made me think that may be
> Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not
> valid, right?
>
> ((1,2,3),(4,5,6))
>
> It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am
> confusing things?
>
> Thanks!
>
> --- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:
>
>
> From: Jeff Zhang <zj...@gmail.com>
> Subject: Re: Accessing fields in Tuple
> To: pig-user@hadoop.apache.org
> Date: Thursday, November 5, 2009, 7:44 PM
>
>
> The input is the arguments you provide to your UDF. It is tuple type. Tuple
> can have more than more than one element. That means your UDF can have more
> than one argument. Here you provide one argument which is tuple type to
> your UDF.
> So that means the first element of input is a tuple.
>
>
> Jeff Zhang
>
>
> On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
>
>> Hi all,
>>
>> I have the follwoing data file
>>
>> (1L,2L,3L)
>> (4L,2L,1L)
>> (8L,3L,4L)
>>
>> I am trying to write a UDF (like sum) that would add the fields in Tuple.
>> This works --
>>
>> public class SumAll extends EvalFunc<Long> {
>> public Long exec(Tuple input) {
>> try {
>> return sum(input);
>> } catch (NumberFormatException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> } catch (ExecException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return 0L;
>> }
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll();
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> Tuple t = (Tuple)it.next();
>> sum += (Long)t.get(0);
>> sum += (Long)t.get(1);
>> sum += (Long)t.get(2);
>> }
>> return sum;
>> }
>>
>> }
>>
>> grunt> A = LOAD 'data2' as aa:bytearray;
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
>> grunt> dump C;
>> 2009-11-05 10:07:09,266 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
>> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
>> written : 3
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
>> written : 0
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
>> (6L)
>> (7L)
>> (15L)
>> grunt>
>>
>> Initially I thought that such a loop would work
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll(); // Would give all fields in Tuple??
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> sum += (Long)t;
>> }
>> return sum;
>> }
>>
>> But I get an error that Tuple can't be cast back to Long. So my question is
>> that what is input.getAll() returning? What is the structure of data that
>> gets passed to exec function?
>>
>> Thanks!
>>
>>
>>
>
>
>
Re: Accessing fields in Tuple
Posted by Kelvin Moss <km...@yahoo.com>.
Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple. That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?
((1,2,3),(4,5,6))
It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things?
Thanks!
--- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:
From: Jeff Zhang <zj...@gmail.com>
Subject: Re: Accessing fields in Tuple
To: pig-user@hadoop.apache.org
Date: Thursday, November 5, 2009, 7:44 PM
The input is the arguments you provide to your UDF. It is tuple type. Tuple
can have more than more than one element. That means your UDF can have more
than one argument. Here you provide one argument which is tuple type to
your UDF.
So that means the first element of input is a tuple.
Jeff Zhang
On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
> Hi all,
>
> I have the follwoing data file
>
> (1L,2L,3L)
> (4L,2L,1L)
> (8L,3L,4L)
>
> I am trying to write a UDF (like sum) that would add the fields in Tuple.
> This works --
>
> public class SumAll extends EvalFunc<Long> {
> public Long exec(Tuple input) {
> try {
> return sum(input);
> } catch (NumberFormatException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (ExecException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return 0L;
> }
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll();
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
> Tuple t = (Tuple)it.next();
> sum += (Long)t.get(0);
> sum += (Long)t.get(1);
> sum += (Long)t.get(2);
> }
> return sum;
> }
>
> }
>
> grunt> A = LOAD 'data2' as aa:bytearray;
> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
> grunt> dump C;
> 2009-11-05 10:07:09,266 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
> written : 3
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
> written : 0
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
> complete!
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (6L)
> (7L)
> (15L)
> grunt>
>
> Initially I thought that such a loop would work
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll(); // Would give all fields in Tuple??
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
> sum += (Long)t;
> }
> return sum;
> }
>
> But I get an error that Tuple can't be cast back to Long. So my question is
> that what is input.getAll() returning? What is the structure of data that
> gets passed to exec function?
>
> Thanks!
>
>
>
Re: Accessing fields in Tuple
Posted by Jeff Zhang <zj...@gmail.com>.
The input is the arguments you provide to your UDF. It is tuple type. Tuple
can have more than more than one element. That means your UDF can have more
than one argument. Here you provide one argument which is tuple type to
your UDF.
So that means the first element of input is a tuple.
Jeff Zhang
On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
> Hi all,
>
> I have the follwoing data file
>
> (1L,2L,3L)
> (4L,2L,1L)
> (8L,3L,4L)
>
> I am trying to write a UDF (like sum) that would add the fields in Tuple.
> This works --
>
> public class SumAll extends EvalFunc<Long> {
> public Long exec(Tuple input) {
> try {
> return sum(input);
> } catch (NumberFormatException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (ExecException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return 0L;
> }
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll();
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
> Tuple t = (Tuple)it.next();
> sum += (Long)t.get(0);
> sum += (Long)t.get(1);
> sum += (Long)t.get(2);
> }
> return sum;
> }
>
> }
>
> grunt> A = LOAD 'data2' as aa:bytearray;
> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
> grunt> dump C;
> 2009-11-05 10:07:09,266 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
> written : 3
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
> written : 0
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
> complete!
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (6L)
> (7L)
> (15L)
> grunt>
>
> Initially I thought that such a loop would work
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll(); // Would give all fields in Tuple??
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
> sum += (Long)t;
> }
> return sum;
> }
>
> But I get an error that Tuple can't be cast back to Long. So my question is
> that what is input.getAll() returning? What is the structure of data that
> gets passed to exec function?
>
> Thanks!
>
>
>