You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kelvin Moss <km...@yahoo.com> on 2009/11/05 11:23:54 UTC

Accessing fields in Tuple

Hi all,
 
I have the follwoing data file
 
(1L,2L,3L)
(4L,2L,1L)
(8L,3L,4L)
 
I am trying to write a UDF (like sum) that would add the fields in Tuple. This works --
 
public class SumAll extends EvalFunc<Long> {
public Long exec(Tuple input) {
try {
return sum(input);
} catch (NumberFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return 0L;
}

static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
     long sum = 0;

     List<Object> values = input.getAll();
     for (Iterator<Object> it = values.iterator(); it.hasNext();) {
         Tuple t = (Tuple)it.next();
         sum += (Long)t.get(0);
         sum += (Long)t.get(1);
         sum += (Long)t.get(2);
      }
      return sum;
}

}
 
grunt> A = LOAD 'data2' as aa:bytearray;
grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
grunt> dump C;
2009-11-05 10:07:09,266 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: "file:/tmp/temp1206478472/tmp-577036369"
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 3
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-11-05 10:07:09,267 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(6L)
(7L)
(15L)
grunt>
 
Initially I thought that such a loop would work
 
static protected Long sum(Tuple input) throws ExecException, NumberFormatException {
long sum = 0;

List<Object> values = input.getAll(); // Would give all fields in Tuple??
for (Iterator<Object> it = values.iterator(); it.hasNext();) {
    sum += (Long)t;
}
return sum;
}
 
But I get an error that Tuple can't be cast back to Long. So my question is that what is input.getAll() returning? What is the structure of data that gets passed to exec function?
 
Thanks! 


      

Re: Accessing fields in Tuple

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi Kevin,

   With tuple's and bag's, you can have arbitrary levels of 
nesting/composition.
That is, a tuple can contain other tuples/bag, and the tuples within a 
bag can contain other tuples/bags.


As Thejas explained - the input to a udf is always a tuple - so whatever 
parameter you passed in - would be wrapped in a tuple and sent across.


You probably want to just use :

myUdf($0, $1, $2) and so on, instead of forcing input to be within 
another tuple.

Hope this helps.
Regards,
Mridul


Kelvin Moss wrote:
>  
> Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple.  That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?
>  
> ((1,2,3),(4,5,6))
>  
> It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things? 
>  
> Thanks!
> 
> --- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:
> 
> 
> From: Jeff Zhang <zj...@gmail.com>
> Subject: Re: Accessing fields in Tuple
> To: pig-user@hadoop.apache.org
> Date: Thursday, November 5, 2009, 7:44 PM
> 
> 
> The input is the arguments you provide to your UDF. It is tuple type.  Tuple
> can have more than more than one element. That means your UDF can have more
> than one argument.  Here you provide one argument which is tuple type to
> your UDF.
> So that means the first element of input is a tuple.
> 
> 
> Jeff Zhang
> 
> 
> On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
> 
>> Hi all,
>>
>> I have the follwoing data file
>>
>> (1L,2L,3L)
>> (4L,2L,1L)
>> (8L,3L,4L)
>>
>> I am trying to write a UDF (like sum) that would add the fields in Tuple.
>> This works --
>>
>> public class SumAll extends EvalFunc<Long> {
>> public Long exec(Tuple input) {
>> try {
>> return sum(input);
>> } catch (NumberFormatException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> } catch (ExecException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return 0L;
>> }
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>>       long sum = 0;
>>
>>       List<Object> values = input.getAll();
>>       for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>>           Tuple t = (Tuple)it.next();
>>           sum += (Long)t.get(0);
>>           sum += (Long)t.get(1);
>>           sum += (Long)t.get(2);
>>        }
>>        return sum;
>> }
>>
>> }
>>
>> grunt> A = LOAD 'data2' as aa:bytearray;
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
>> grunt> dump C;
>> 2009-11-05 10:07:09,266 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
>> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
>> written : 3
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
>> written : 0
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
>> (6L)
>> (7L)
>> (15L)
>> grunt>
>>
>> Initially I thought that such a loop would work
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll(); // Would give all fields in Tuple??
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>>      sum += (Long)t;
>> }
>> return sum;
>> }
>>
>> But I get an error that Tuple can't be cast back to Long. So my question is
>> that what is input.getAll() returning? What is the structure of data that
>> gets passed to exec function?
>>
>> Thanks!
>>
>>
>>
> 
> 
> 
>       


Re: Accessing fields in Tuple

Posted by Thejas Nair <te...@yahoo-inc.com>.
Hi Kevin,

The inputs parameters to the udf are wrapped inside a tuple and then given
as input to the execu function in the udf.
In case of -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
The exec function gets a Tuple with one column which is a
tuple(long,long,long)
ie in  exec(Tuple input), input.get(0) will return tuple(long,long,long) .

On the other hand if you called the udf this way -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((long)a1,(chararray)a2);
 in  exec(Tuple input), input.get(0) will return long, input.get(1) will
return chararray.

I hope this answers you question.

Thanks,
Thejas




On 11/5/09 9:15 PM, "Kelvin Moss" <km...@yahoo.com> wrote:

>  
> Thanks for the reply. I understand that Tuple can have more than one field.
> That is why I was expecting Tuple.getAll to return me all the fields in the
> Tuple. But as it turns out it returns a Tuple.  That made me think that may be
> Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not
> valid, right?
>  
> ((1,2,3),(4,5,6))
>  
> It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am
> confusing things? 
>  
> Thanks!
> 
> --- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:
> 
> 
> From: Jeff Zhang <zj...@gmail.com>
> Subject: Re: Accessing fields in Tuple
> To: pig-user@hadoop.apache.org
> Date: Thursday, November 5, 2009, 7:44 PM
> 
> 
> The input is the arguments you provide to your UDF. It is tuple type.  Tuple
> can have more than more than one element. That means your UDF can have more
> than one argument.  Here you provide one argument which is tuple type to
> your UDF.
> So that means the first element of input is a tuple.
> 
> 
> Jeff Zhang
> 
> 
> On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:
> 
>> Hi all,
>> 
>> I have the follwoing data file
>> 
>> (1L,2L,3L)
>> (4L,2L,1L)
>> (8L,3L,4L)
>> 
>> I am trying to write a UDF (like sum) that would add the fields in Tuple.
>> This works --
>> 
>> public class SumAll extends EvalFunc<Long> {
>> public Long exec(Tuple input) {
>> try {
>> return sum(input);
>> } catch (NumberFormatException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> } catch (ExecException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return 0L;
>> }
>> 
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>>       long sum = 0;
>> 
>>       List<Object> values = input.getAll();
>>       for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>>           Tuple t = (Tuple)it.next();
>>           sum += (Long)t.get(0);
>>           sum += (Long)t.get(1);
>>           sum += (Long)t.get(2);
>>        }
>>        return sum;
>> }
>> 
>> }
>> 
>> grunt> A = LOAD 'data2' as aa:bytearray;
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
>> grunt> dump C;
>> 2009-11-05 10:07:09,266 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
>> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
>> written : 3
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
>> written : 0
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
>> (6L)
>> (7L)
>> (15L)
>> grunt>
>> 
>> Initially I thought that such a loop would work
>> 
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>> 
>> List<Object> values = input.getAll(); // Would give all fields in Tuple??
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>>      sum += (Long)t;
>> }
>> return sum;
>> }
>> 
>> But I get an error that Tuple can't be cast back to Long. So my question is
>> that what is input.getAll() returning? What is the structure of data that
>> gets passed to exec function?
>> 
>> Thanks!
>> 
>> 
>> 
> 
> 
> 


Re: Accessing fields in Tuple

Posted by Kelvin Moss <km...@yahoo.com>.
 
Thanks for the reply. I understand that Tuple can have more than one field. That is why I was expecting Tuple.getAll to return me all the fields in the Tuple. But as it turns out it returns a Tuple.  That made me think that may be Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not valid, right?
 
((1,2,3),(4,5,6))
 
It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing things? 
 
Thanks!

--- On Thu, 11/5/09, Jeff Zhang <zj...@gmail.com> wrote:


From: Jeff Zhang <zj...@gmail.com>
Subject: Re: Accessing fields in Tuple
To: pig-user@hadoop.apache.org
Date: Thursday, November 5, 2009, 7:44 PM


The input is the arguments you provide to your UDF. It is tuple type.  Tuple
can have more than more than one element. That means your UDF can have more
than one argument.  Here you provide one argument which is tuple type to
your UDF.
So that means the first element of input is a tuple.


Jeff Zhang


On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:

> Hi all,
>
> I have the follwoing data file
>
> (1L,2L,3L)
> (4L,2L,1L)
> (8L,3L,4L)
>
> I am trying to write a UDF (like sum) that would add the fields in Tuple.
> This works --
>
> public class SumAll extends EvalFunc<Long> {
> public Long exec(Tuple input) {
> try {
> return sum(input);
> } catch (NumberFormatException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (ExecException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return 0L;
> }
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
>      long sum = 0;
>
>      List<Object> values = input.getAll();
>      for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>          Tuple t = (Tuple)it.next();
>          sum += (Long)t.get(0);
>          sum += (Long)t.get(1);
>          sum += (Long)t.get(2);
>       }
>       return sum;
> }
>
> }
>
> grunt> A = LOAD 'data2' as aa:bytearray;
> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
> grunt> dump C;
> 2009-11-05 10:07:09,266 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
> written : 3
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
> written : 0
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
> complete!
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (6L)
> (7L)
> (15L)
> grunt>
>
> Initially I thought that such a loop would work
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll(); // Would give all fields in Tuple??
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>     sum += (Long)t;
> }
> return sum;
> }
>
> But I get an error that Tuple can't be cast back to Long. So my question is
> that what is input.getAll() returning? What is the structure of data that
> gets passed to exec function?
>
> Thanks!
>
>
>



      

Re: Accessing fields in Tuple

Posted by Jeff Zhang <zj...@gmail.com>.
The input is the arguments you provide to your UDF. It is tuple type.  Tuple
can have more than more than one element. That means your UDF can have more
than one argument.  Here you provide one argument which is tuple type to
your UDF.
So that means the first element of input is a tuple.


Jeff Zhang


On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km...@yahoo.com> wrote:

> Hi all,
>
> I have the follwoing data file
>
> (1L,2L,3L)
> (4L,2L,1L)
> (8L,3L,4L)
>
> I am trying to write a UDF (like sum) that would add the fields in Tuple.
> This works --
>
> public class SumAll extends EvalFunc<Long> {
> public Long exec(Tuple input) {
> try {
> return sum(input);
> } catch (NumberFormatException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (ExecException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return 0L;
> }
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
>      long sum = 0;
>
>      List<Object> values = input.getAll();
>      for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>          Tuple t = (Tuple)it.next();
>          sum += (Long)t.get(0);
>          sum += (Long)t.get(1);
>          sum += (Long)t.get(2);
>       }
>       return sum;
> }
>
> }
>
> grunt> A = LOAD 'data2' as aa:bytearray;
> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
> grunt> dump C;
> 2009-11-05 10:07:09,266 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
> written : 3
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
> written : 0
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
> complete!
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (6L)
> (7L)
> (15L)
> grunt>
>
> Initially I thought that such a loop would work
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll(); // Would give all fields in Tuple??
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>     sum += (Long)t;
> }
> return sum;
> }
>
> But I get an error that Tuple can't be cast back to Long. So my question is
> that what is input.getAll() returning? What is the structure of data that
> gets passed to exec function?
>
> Thanks!
>
>
>