You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ey-chih chow <ey...@gmail.com> on 2013/11/27 23:14:13 UTC

Pig syntax to access fields of records in an array

Hi,

We have an Avro file of which a field that is an array of tuples as follows:


cam:bag{ARRAY_ELEM:tuple(BIDCOUNT: int, ...


I tried to access BIDCOUNT with 'cam.BIDCOUNT'.  It is not working.  Any
body knows how to access BIDCOUNT?  Thanks.


Ey-Chih Chow

Re: Pig syntax to access fields of records in an array

Posted by ey-chih chow <ey...@gmail.com>.
Basically, I would like to know if a field f of an Avro record is of the
type array.  How can I access the first element of the field in Pig?
 Thanks.

Ey-Chih Chow


On Thu, Nov 28, 2013 at 12:52 PM, ey-chih chow <ey...@gmail.com> wrote:

> Sorry, in the previous post, the avro schema of the field should be:
>
>  {
>     "name" : "com",
>     "type" : {
>       "type" : "array",
>       "items" : {
>         "type" : "record",
>         "name" : "campaignRecord",
>         "doc" : "RTB json logs flattened.",
>         "fields" : [ {
>           "name" : "BIDCOUNT",
>           "type" : "int"
>         }]
>       }
>    }
> }
>
> Thanks.
>
>
> Ey-Chih Chow
>
>
> On Thu, Nov 28, 2013 at 12:33 PM, ey-chih chow <ey...@gmail.com> wrote:
>
>> I have a Pig script.  The script begins with a load statement that loads
>> data in an avro file.  The schema of data in the file has a field com that
>> is defined in the following way in the schema:
>>
>> {
>>     "name" : "com",
>>     "type" : {
>>       "type" : "array",
>>       "items" : {
>>         "type" : "record",
>>         "name" : "campaignRecord",
>>         "doc" : "RTB json logs flattened.",
>>         "fields" : [ {
>>           "name" : "BIDCOUNT",
>>           "type" : "int"
>>         }}
>> }
>> }
>>
>>
>> After the load statement, there is a group-by statement that does a group
>> by on some other fields.   After the group-by, we have the following
>> statement:
>>
>> FOREACH gstmt group AS key,SUM(RTBALLLOGS.com.BIDCOUNT) AS BIDCOUNT;
>>
>> This statement is not working with the following message when I debug the
>> script with Eclipse:
>>
>> Cannot find field BIDCOUNT in com:bag{ARRAY_ELEM:tuple(BIDCOUNT;int))
>>
>> Thanks.
>>
>> Ey-Chih
>>
>>
>> On Thu, Nov 28, 2013 at 9:39 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
>>
>>> I think your expression ends up with a bag with just that column. Can you
>>> give the full context where it is not working?
>>> 28 нояб. 2013 г. 2:14 пользователь "ey-chih chow" <ey...@gmail.com>
>>> написал:
>>>
>>> > Hi,
>>> >
>>> > We have an Avro file of which a field that is an array of tuples as
>>> > follows:
>>> >
>>> >
>>> > cam:bag{ARRAY_ELEM:tuple(BIDCOUNT: int, ...
>>> >
>>> >
>>> > I tried to access BIDCOUNT with 'cam.BIDCOUNT'.  It is not working.
>>>  Any
>>> > body knows how to access BIDCOUNT?  Thanks.
>>> >
>>> >
>>> > Ey-Chih Chow
>>> >
>>>
>>
>>
>

Re: Pig syntax to access fields of records in an array

Posted by ey-chih chow <ey...@gmail.com>.
Sorry, in the previous post, the avro schema of the field should be:

 {
    "name" : "com",
    "type" : {
      "type" : "array",
      "items" : {
        "type" : "record",
        "name" : "campaignRecord",
        "doc" : "RTB json logs flattened.",
        "fields" : [ {
          "name" : "BIDCOUNT",
          "type" : "int"
        }]
      }
   }
}

Thanks.


Ey-Chih Chow


On Thu, Nov 28, 2013 at 12:33 PM, ey-chih chow <ey...@gmail.com> wrote:

> I have a Pig script.  The script begins with a load statement that loads
> data in an avro file.  The schema of data in the file has a field com that
> is defined in the following way in the schema:
>
> {
>     "name" : "com",
>     "type" : {
>       "type" : "array",
>       "items" : {
>         "type" : "record",
>         "name" : "campaignRecord",
>         "doc" : "RTB json logs flattened.",
>         "fields" : [ {
>           "name" : "BIDCOUNT",
>           "type" : "int"
>         }}
> }
> }
>
>
> After the load statement, there is a group-by statement that does a group
> by on some other fields.   After the group-by, we have the following
> statement:
>
> FOREACH gstmt group AS key,SUM(RTBALLLOGS.com.BIDCOUNT) AS BIDCOUNT;
>
> This statement is not working with the following message when I debug the
> script with Eclipse:
>
> Cannot find field BIDCOUNT in com:bag{ARRAY_ELEM:tuple(BIDCOUNT;int))
>
> Thanks.
>
> Ey-Chih
>
>
> On Thu, Nov 28, 2013 at 9:39 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:
>
>> I think your expression ends up with a bag with just that column. Can you
>> give the full context where it is not working?
>> 28 нояб. 2013 г. 2:14 пользователь "ey-chih chow" <ey...@gmail.com>
>> написал:
>>
>> > Hi,
>> >
>> > We have an Avro file of which a field that is an array of tuples as
>> > follows:
>> >
>> >
>> > cam:bag{ARRAY_ELEM:tuple(BIDCOUNT: int, ...
>> >
>> >
>> > I tried to access BIDCOUNT with 'cam.BIDCOUNT'.  It is not working.  Any
>> > body knows how to access BIDCOUNT?  Thanks.
>> >
>> >
>> > Ey-Chih Chow
>> >
>>
>
>

Re: Pig syntax to access fields of records in an array

Posted by ey-chih chow <ey...@gmail.com>.
I have a Pig script.  The script begins with a load statement that loads
data in an avro file.  The schema of data in the file has a field com that
is defined in the following way in the schema:

{
    "name" : "com",
    "type" : {
      "type" : "array",
      "items" : {
        "type" : "record",
        "name" : "campaignRecord",
        "doc" : "RTB json logs flattened.",
        "fields" : [ {
          "name" : "BIDCOUNT",
          "type" : "int"
        }}
}
}


After the load statement, there is a group-by statement that does a group
by on some other fields.   After the group-by, we have the following
statement:

FOREACH gstmt group AS key,SUM(RTBALLLOGS.com.BIDCOUNT) AS BIDCOUNT;

This statement is not working with the following message when I debug the
script with Eclipse:

Cannot find field BIDCOUNT in com:bag{ARRAY_ELEM:tuple(BIDCOUNT;int))

Thanks.

Ey-Chih


On Thu, Nov 28, 2013 at 9:39 AM, Ruslan Al-Fakikh <me...@gmail.com>wrote:

> I think your expression ends up with a bag with just that column. Can you
> give the full context where it is not working?
> 28 нояб. 2013 г. 2:14 пользователь "ey-chih chow" <ey...@gmail.com>
> написал:
>
> > Hi,
> >
> > We have an Avro file of which a field that is an array of tuples as
> > follows:
> >
> >
> > cam:bag{ARRAY_ELEM:tuple(BIDCOUNT: int, ...
> >
> >
> > I tried to access BIDCOUNT with 'cam.BIDCOUNT'.  It is not working.  Any
> > body knows how to access BIDCOUNT?  Thanks.
> >
> >
> > Ey-Chih Chow
> >
>

Re: Pig syntax to access fields of records in an array

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
I think your expression ends up with a bag with just that column. Can you
give the full context where it is not working?
28 нояб. 2013 г. 2:14 пользователь "ey-chih chow" <ey...@gmail.com>
написал:

> Hi,
>
> We have an Avro file of which a field that is an array of tuples as
> follows:
>
>
> cam:bag{ARRAY_ELEM:tuple(BIDCOUNT: int, ...
>
>
> I tried to access BIDCOUNT with 'cam.BIDCOUNT'.  It is not working.  Any
> body knows how to access BIDCOUNT?  Thanks.
>
>
> Ey-Chih Chow
>