You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Badrinarayanan S <ba...@fifthgentech.com> on 2011/04/08 12:08:31 UTC

Dereferencing columns of nested bags

Is it possible to dereference a column part of a nested bag. In the schema
given below, I am trying to dereference the columns Key and Value which is
part of visit bag which is part of visits bag.

 

(id, visits:bag{visittuple:tuple(timestamp,
visit:bag{details:tuple(Key:chararray, Value:chararray)})})

 

>From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix for
dereferencing a bag within tuple does not work). Does it also addresses
nested bags? If so for the above example can it be dereferenced as
visits.visit.Key. I tried it against the latest trunk, but it failed. 

 

Thanks,

badri



Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.

Re: Dereferencing columns of nested bags

Posted by Daniel Dai <ji...@yahoo-inc.com>.
Yes, it is a bug before, and is addressed in 
https://issues.apache.org/jira/browse/PIG-1866.

Daniel

On 04/09/2011 04:14 AM, Mridul Muralidharan wrote:
> If you try to project that out, you will end up with exceptions - which
> was the issue being raised (not the expected functionality - which is
> understood well : whether flatten is required or not depends on the
> script/udf's in question).
>
>
> To illustrate, please try :
>
> A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp,
> visit:bag{details:tuple(Key:chararray, Value:chararray)})});
> B = FOREACH A generate $1.$1;
>
> This will result in exceptions.
>
>
> While
>
> A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp,
> visit:bag{details:tuple(Key:chararray, Value:chararray)})});
> B = FOREACH A generate $1.$0;
>
> works.
>
>
>
>
> Regards,
> Mridul
>
>
> On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote:
>> Bag dereference results a bag with less columns. It does not reduce the
>> nested levels.
>>
>> $1 refer to  visits: {(timestamp: bytearray,visit: {(Key:
>> chararray,Value: chararray)})}
>> $1.$1 slice the second column of the bag, all it does is drop timestamp
>> column from bag "visits". The bag is still there. The schema for $1.$1
>> is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is
>> different than tuple. If $1 is a tuple, $1.$1 does reduce one level and
>> get nested item out.
>>
>> To reduce the level of a bag, you can only flatten the bag.
>>
>> Daniel
>>
>> On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
>>> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>>>> I am trying it within foreach as part of generate.
>>>>
>>>> I believe the innermost tuple of Key and Value is considered as a single
>>>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>>>> tuple. However would it be possible to generate the Key and Value as
>>>> separate columns as part of foreach.
>>>>
>>>> The reference to $1.$1.($0, $1) results in an error like out of bound
>>>> access.
>>> You are right, it looks pretty broken.
>>> You can reference $1.$0 but not $1.$1 !
>>>
>>> You might want to file a JIRA I guess ...
>>>
>>>
>>>
>>> If you split it into multiple foreach/flatten invocations, you can get
>>> to the data you want (but it is not the same functionally since you
>>> loose record level aggregation that $1.$1.$0 (for ex) provides).
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>>> Thanks,
>>>> badri
>>>>
>>>> -----Original Message-----
>>>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>>>> Sent: Friday, April 08, 2011 4:32 PM
>>>> To: user@pig.apache.org
>>>> Cc: Badrinarayanan S
>>>> Subject: Re: Dereferencing columns of nested bags
>>>>
>>>>
>>>> How are you trying to reference it ? Within foreach ? Filter ? Or
>>>> elsewhere ?
>>>>
>>>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>>>> as a tuple ?
>>>>
>>>>
>>>> - Mridul
>>>>
>>>>
>>>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>>>> Is it possible to dereference a column part of a nested bag. In the schema
>>>>> given below, I am trying to dereference the columns Key and Value which is
>>>>> part of visit bag which is part of visits bag.
>>>>>
>>>>>
>>>>>
>>>>> (id, visits:bag{visittuple:tuple(timestamp,
>>>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>>>
>>>>>
>>>>>
>>>>>      From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix
>>>> for
>>>>> dereferencing a bag within tuple does not work). Does it also addresses
>>>>> nested bags? If so for the above example can it be dereferenced as
>>>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> badri
>>>>>
>>>>>
>>>>>
>>>>> Disclaimer: This message (including any attachments) is being sent from
>>>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>>>> information that is proprietary, confidential and privileged. If you are not
>>>> the intended recipient, please inform the sender immediately by reply e-mail
>>>> and delete this message and attachments from your system, without retaining
>>>> a copy. Any unauthorized use or dissemination of this message in whole or in
>>>> part is strictly prohibited. 5G shall  not be liable for the improper or
>>>> incomplete transmission of the information contained in this  communication
>>>> nor for any delay in its receipt or damage to your system. 5G does not
>>>> guarantee that the integrity of this communication has been maintained nor
>>>> that this communication is free of viruses, interceptions or interference.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>>>>


RE: Dereferencing columns of nested bags

Posted by Badrinarayanan S <ba...@fifthgentech.com>.
Thanks Mridul and Daniel. 

For now I am doing flatten and referencing the details. It works but it
takes additional few steps. Would like to see the fix for this in the trunk.

Regards,
badri

-----Original Message-----
From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com] 
Sent: Saturday, April 09, 2011 4:45 PM
To: user@pig.apache.org
Cc: Daniel Dai
Subject: Re: Dereferencing columns of nested bags


If you try to project that out, you will end up with exceptions - which 
was the issue being raised (not the expected functionality - which is 
understood well : whether flatten is required or not depends on the 
script/udf's in question).


To illustrate, please try :

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$1;

This will result in exceptions.


While

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$0;

works.




Regards,
Mridul


On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote:
> Bag dereference results a bag with less columns. It does not reduce the
> nested levels.
>
> $1 refer to  visits: {(timestamp: bytearray,visit: {(Key:
> chararray,Value: chararray)})}
> $1.$1 slice the second column of the bag, all it does is drop timestamp
> column from bag "visits". The bag is still there. The schema for $1.$1
> is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is
> different than tuple. If $1 is a tuple, $1.$1 does reduce one level and
> get nested item out.
>
> To reduce the level of a bag, you can only flatten the bag.
>
> Daniel
>
> On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
>> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>>> I am trying it within foreach as part of generate.
>>>
>>> I believe the innermost tuple of Key and Value is considered as a single
>>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>>> tuple. However would it be possible to generate the Key and Value as
>>> separate columns as part of foreach.
>>>
>>> The reference to $1.$1.($0, $1) results in an error like out of bound
>>> access.
>>
>> You are right, it looks pretty broken.
>> You can reference $1.$0 but not $1.$1 !
>>
>> You might want to file a JIRA I guess ...
>>
>>
>>
>> If you split it into multiple foreach/flatten invocations, you can get
>> to the data you want (but it is not the same functionally since you
>> loose record level aggregation that $1.$1.$0 (for ex) provides).
>>
>>
>> Regards,
>> Mridul
>>
>>> Thanks,
>>> badri
>>>
>>> -----Original Message-----
>>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>>> Sent: Friday, April 08, 2011 4:32 PM
>>> To: user@pig.apache.org
>>> Cc: Badrinarayanan S
>>> Subject: Re: Dereferencing columns of nested bags
>>>
>>>
>>> How are you trying to reference it ? Within foreach ? Filter ? Or
>>> elsewhere ?
>>>
>>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>>> as a tuple ?
>>>
>>>
>>> - Mridul
>>>
>>>
>>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>>> Is it possible to dereference a column part of a nested bag. In the
schema
>>>> given below, I am trying to dereference the columns Key and Value which
is
>>>> part of visit bag which is part of visits bag.
>>>>
>>>>
>>>>
>>>> (id, visits:bag{visittuple:tuple(timestamp,
>>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>>
>>>>
>>>>
>>>>     From the SVN trunk of Pig I could see a fix for this (PIG-1866, the
fix
>>> for
>>>> dereferencing a bag within tuple does not work). Does it also addresses
>>>> nested bags? If so for the above example can it be dereferenced as
>>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> badri
>>>>
>>>>
>>>>
>>>> Disclaimer: This message (including any attachments) is being sent from
>>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>>> information that is proprietary, confidential and privileged. If you are
not
>>> the intended recipient, please inform the sender immediately by reply
e-mail
>>> and delete this message and attachments from your system, without
retaining
>>> a copy. Any unauthorized use or dissemination of this message in whole
or in
>>> part is strictly prohibited. 5G shall  not be liable for the improper or
>>> incomplete transmission of the information contained in this
communication
>>> nor for any delay in its receipt or damage to your system. 5G does not
>>> guarantee that the integrity of this communication has been maintained
nor
>>> that this communication is free of viruses, interceptions or
interference.
>>>
>>>
>>>
>>>
>>>
>>> Disclaimer: This message (including any attachments) is being sent from
Fifth Generation Technologies India (P) Ltd. (5G) and may contain
information that is proprietary, confidential and privileged. If you are not
the intended recipient, please inform the sender immediately by reply e-mail
and delete this message and attachments from your system, without retaining
a copy. Any unauthorized use or dissemination of this message in whole or in
part is strictly prohibited. 5G shall  not be liable for the improper or
incomplete transmission of the information contained in this  communication
nor for any delay in its receipt or damage to your system. 5G does not
guarantee that the integrity of this communication has been maintained nor
that this communication is free of viruses, interceptions or interference.
>>>
>





Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


Re: Dereferencing columns of nested bags

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
If you try to project that out, you will end up with exceptions - which 
was the issue being raised (not the expected functionality - which is 
understood well : whether flatten is required or not depends on the 
script/udf's in question).


To illustrate, please try :

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$1;

This will result in exceptions.


While

A = load 'some_file' AS (id, visits:bag{visittuple:tuple(timestamp, 
visit:bag{details:tuple(Key:chararray, Value:chararray)})});
B = FOREACH A generate $1.$0;

works.




Regards,
Mridul


On Saturday 09 April 2011 02:39 AM, Daniel Dai wrote:
> Bag dereference results a bag with less columns. It does not reduce the
> nested levels.
>
> $1 refer to  visits: {(timestamp: bytearray,visit: {(Key:
> chararray,Value: chararray)})}
> $1.$1 slice the second column of the bag, all it does is drop timestamp
> column from bag "visits". The bag is still there. The schema for $1.$1
> is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is
> different than tuple. If $1 is a tuple, $1.$1 does reduce one level and
> get nested item out.
>
> To reduce the level of a bag, you can only flatten the bag.
>
> Daniel
>
> On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
>> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>>> I am trying it within foreach as part of generate.
>>>
>>> I believe the innermost tuple of Key and Value is considered as a single
>>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>>> tuple. However would it be possible to generate the Key and Value as
>>> separate columns as part of foreach.
>>>
>>> The reference to $1.$1.($0, $1) results in an error like out of bound
>>> access.
>>
>> You are right, it looks pretty broken.
>> You can reference $1.$0 but not $1.$1 !
>>
>> You might want to file a JIRA I guess ...
>>
>>
>>
>> If you split it into multiple foreach/flatten invocations, you can get
>> to the data you want (but it is not the same functionally since you
>> loose record level aggregation that $1.$1.$0 (for ex) provides).
>>
>>
>> Regards,
>> Mridul
>>
>>> Thanks,
>>> badri
>>>
>>> -----Original Message-----
>>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>>> Sent: Friday, April 08, 2011 4:32 PM
>>> To: user@pig.apache.org
>>> Cc: Badrinarayanan S
>>> Subject: Re: Dereferencing columns of nested bags
>>>
>>>
>>> How are you trying to reference it ? Within foreach ? Filter ? Or
>>> elsewhere ?
>>>
>>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>>> as a tuple ?
>>>
>>>
>>> - Mridul
>>>
>>>
>>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>>> Is it possible to dereference a column part of a nested bag. In the schema
>>>> given below, I am trying to dereference the columns Key and Value which is
>>>> part of visit bag which is part of visits bag.
>>>>
>>>>
>>>>
>>>> (id, visits:bag{visittuple:tuple(timestamp,
>>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>>
>>>>
>>>>
>>>>     From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix
>>> for
>>>> dereferencing a bag within tuple does not work). Does it also addresses
>>>> nested bags? If so for the above example can it be dereferenced as
>>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> badri
>>>>
>>>>
>>>>
>>>> Disclaimer: This message (including any attachments) is being sent from
>>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>>> information that is proprietary, confidential and privileged. If you are not
>>> the intended recipient, please inform the sender immediately by reply e-mail
>>> and delete this message and attachments from your system, without retaining
>>> a copy. Any unauthorized use or dissemination of this message in whole or in
>>> part is strictly prohibited. 5G shall  not be liable for the improper or
>>> incomplete transmission of the information contained in this  communication
>>> nor for any delay in its receipt or damage to your system. 5G does not
>>> guarantee that the integrity of this communication has been maintained nor
>>> that this communication is free of viruses, interceptions or interference.
>>>
>>>
>>>
>>>
>>>
>>> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>>>
>


NullPointerException at POProject.consumeInputBag

Posted by Badrinarayanan S <ba...@fifthgentech.com>.
Hi,

I am working with below schema against the latest trunk of Pig.

Visits = LOAD 'cassandra://MyTest/MyCF' USING CassandraStorage() as
(Id:chararray, Details:bag{DetailsTuple:tuple(VisitTimestamp:chararray,
VisitDetails:bag{VisitColumns:tuple(ColumnName:chararray,
ColumnValue:chararray)})});

Against the Visits relation if I do FOREACH and GENERATE for
Details.VisitDetails or for any columns of the Details bag, I get below
exception. However when I include the Id column ($0), the GENERATE works
fine. In other words whenever I am not including the Id column I get the
below exception as part of the FOREACH GENERATE.

Not sure what is that I am missing. Any pointers please...

Regards,
badri


java.lang.NullPointerException
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POProject.consumeInputBag(POProject.java:310)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
ors.POProject.getNext(POProject.java:251)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.getNext(PhysicalOperator.java:316)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.processPlan(POForEach.java:332)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:284)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:233)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POFilter.getNext(POFilter.java:95)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POForEach.getNext(POForEach.java:233)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
.processInput(PhysicalOperator.java:290)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POLocalRearrange.getNext(POLocalRearrange.java:256)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
ors.POUnion.getNext(POUnion.java:165)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runP
ipeline(PigMapBase.java:261)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(
PigMapBase.java:256)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(
PigMapBase.java:58)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)





Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


RE: Dereferencing columns of nested bags

Posted by Badrinarayanan S <ba...@fifthgentech.com>.
It helps. Thanks. 

-----Original Message-----
From: Daniel Dai [mailto:jianyong@yahoo-inc.com] 
Sent: Saturday, April 09, 2011 2:39 AM
To: user@pig.apache.org
Subject: Re: Dereferencing columns of nested bags

Bag dereference results a bag with less columns. It does not reduce the 
nested levels.

$1 refer to  visits: {(timestamp: bytearray,visit: {(Key: 
chararray,Value: chararray)})}
$1.$1 slice the second column of the bag, all it does is drop timestamp 
column from bag "visits". The bag is still there. The schema for $1.$1 
is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is 
different than tuple. If $1 is a tuple, $1.$1 does reduce one level and 
get nested item out.

To reduce the level of a bag, you can only flatten the bag.

Daniel

On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>> I am trying it within foreach as part of generate.
>>
>> I believe the innermost tuple of Key and Value is considered as a single
>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>> tuple. However would it be possible to generate the Key and Value as
>> separate columns as part of foreach.
>>
>> The reference to $1.$1.($0, $1) results in an error like out of bound
>> access.
>
> You are right, it looks pretty broken.
> You can reference $1.$0 but not $1.$1 !
>
> You might want to file a JIRA I guess ...
>
>
>
> If you split it into multiple foreach/flatten invocations, you can get
> to the data you want (but it is not the same functionally since you
> loose record level aggregation that $1.$1.$0 (for ex) provides).
>
>
> Regards,
> Mridul
>
>> Thanks,
>> badri
>>
>> -----Original Message-----
>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>> Sent: Friday, April 08, 2011 4:32 PM
>> To: user@pig.apache.org
>> Cc: Badrinarayanan S
>> Subject: Re: Dereferencing columns of nested bags
>>
>>
>> How are you trying to reference it ? Within foreach ? Filter ? Or
>> elsewhere ?
>>
>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>> as a tuple ?
>>
>>
>> - Mridul
>>
>>
>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>> Is it possible to dereference a column part of a nested bag. In the
schema
>>> given below, I am trying to dereference the columns Key and Value which
is
>>> part of visit bag which is part of visits bag.
>>>
>>>
>>>
>>> (id, visits:bag{visittuple:tuple(timestamp,
>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>
>>>
>>>
>>>    From the SVN trunk of Pig I could see a fix for this (PIG-1866, the
fix
>> for
>>> dereferencing a bag within tuple does not work). Does it also addresses
>>> nested bags? If so for the above example can it be dereferenced as
>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> badri
>>>
>>>
>>>
>>> Disclaimer: This message (including any attachments) is being sent from
>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>> information that is proprietary, confidential and privileged. If you are
not
>> the intended recipient, please inform the sender immediately by reply
e-mail
>> and delete this message and attachments from your system, without
retaining
>> a copy. Any unauthorized use or dissemination of this message in whole or
in
>> part is strictly prohibited. 5G shall  not be liable for the improper or
>> incomplete transmission of the information contained in this
communication
>> nor for any delay in its receipt or damage to your system. 5G does not
>> guarantee that the integrity of this communication has been maintained
nor
>> that this communication is free of viruses, interceptions or
interference.
>>
>>
>>
>>
>>
>> Disclaimer: This message (including any attachments) is being sent from
Fifth Generation Technologies India (P) Ltd. (5G) and may contain
information that is proprietary, confidential and privileged. If you are not
the intended recipient, please inform the sender immediately by reply e-mail
and delete this message and attachments from your system, without retaining
a copy. Any unauthorized use or dissemination of this message in whole or in
part is strictly prohibited. 5G shall  not be liable for the improper or
incomplete transmission of the information contained in this  communication
nor for any delay in its receipt or damage to your system. 5G does not
guarantee that the integrity of this communication has been maintained nor
that this communication is free of viruses, interceptions or interference.
>>





Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


Re: Dereferencing columns of nested bags

Posted by Daniel Dai <ji...@yahoo-inc.com>.
Bag dereference results a bag with less columns. It does not reduce the 
nested levels.

$1 refer to  visits: {(timestamp: bytearray,visit: {(Key: 
chararray,Value: chararray)})}
$1.$1 slice the second column of the bag, all it does is drop timestamp 
column from bag "visits". The bag is still there. The schema for $1.$1 
is {(visit: {(Key: chararray,Value: chararray)})}. Note the mechanism is 
different than tuple. If $1 is a tuple, $1.$1 does reduce one level and 
get nested item out.

To reduce the level of a bag, you can only flatten the bag.

Daniel

On 04/08/2011 09:41 AM, Mridul Muralidharan wrote:
> On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
>> I am trying it within foreach as part of generate.
>>
>> I believe the innermost tuple of Key and Value is considered as a single
>> column. So I am able to refer only to $1.$1.($0) which gives the whole
>> tuple. However would it be possible to generate the Key and Value as
>> separate columns as part of foreach.
>>
>> The reference to $1.$1.($0, $1) results in an error like out of bound
>> access.
>
> You are right, it looks pretty broken.
> You can reference $1.$0 but not $1.$1 !
>
> You might want to file a JIRA I guess ...
>
>
>
> If you split it into multiple foreach/flatten invocations, you can get
> to the data you want (but it is not the same functionally since you
> loose record level aggregation that $1.$1.$0 (for ex) provides).
>
>
> Regards,
> Mridul
>
>> Thanks,
>> badri
>>
>> -----Original Message-----
>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>> Sent: Friday, April 08, 2011 4:32 PM
>> To: user@pig.apache.org
>> Cc: Badrinarayanan S
>> Subject: Re: Dereferencing columns of nested bags
>>
>>
>> How are you trying to reference it ? Within foreach ? Filter ? Or
>> elsewhere ?
>>
>> Doesn't something like $1.$1.($0, $1) not work to reference key, value
>> as a tuple ?
>>
>>
>> - Mridul
>>
>>
>> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>>> Is it possible to dereference a column part of a nested bag. In the schema
>>> given below, I am trying to dereference the columns Key and Value which is
>>> part of visit bag which is part of visits bag.
>>>
>>>
>>>
>>> (id, visits:bag{visittuple:tuple(timestamp,
>>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>>
>>>
>>>
>>>    From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix
>> for
>>> dereferencing a bag within tuple does not work). Does it also addresses
>>> nested bags? If so for the above example can it be dereferenced as
>>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> badri
>>>
>>>
>>>
>>> Disclaimer: This message (including any attachments) is being sent from
>> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
>> information that is proprietary, confidential and privileged. If you are not
>> the intended recipient, please inform the sender immediately by reply e-mail
>> and delete this message and attachments from your system, without retaining
>> a copy. Any unauthorized use or dissemination of this message in whole or in
>> part is strictly prohibited. 5G shall  not be liable for the improper or
>> incomplete transmission of the information contained in this  communication
>> nor for any delay in its receipt or damage to your system. 5G does not
>> guarantee that the integrity of this communication has been maintained nor
>> that this communication is free of viruses, interceptions or interference.
>>
>>
>>
>>
>>
>> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>>


Re: Dereferencing columns of nested bags

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
On Friday 08 April 2011 08:16 PM, Badrinarayanan S wrote:
> I am trying it within foreach as part of generate.
>
> I believe the innermost tuple of Key and Value is considered as a single
> column. So I am able to refer only to $1.$1.($0) which gives the whole
> tuple. However would it be possible to generate the Key and Value as
> separate columns as part of foreach.
>
> The reference to $1.$1.($0, $1) results in an error like out of bound
> access.


You are right, it looks pretty broken.
You can reference $1.$0 but not $1.$1 !

You might want to file a JIRA I guess ...



If you split it into multiple foreach/flatten invocations, you can get 
to the data you want (but it is not the same functionally since you 
loose record level aggregation that $1.$1.$0 (for ex) provides).


Regards,
Mridul

>
> Thanks,
> badri
>
> -----Original Message-----
> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
> Sent: Friday, April 08, 2011 4:32 PM
> To: user@pig.apache.org
> Cc: Badrinarayanan S
> Subject: Re: Dereferencing columns of nested bags
>
>
> How are you trying to reference it ? Within foreach ? Filter ? Or
> elsewhere ?
>
> Doesn't something like $1.$1.($0, $1) not work to reference key, value
> as a tuple ?
>
>
> - Mridul
>
>
> On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
>> Is it possible to dereference a column part of a nested bag. In the schema
>> given below, I am trying to dereference the columns Key and Value which is
>> part of visit bag which is part of visits bag.
>>
>>
>>
>> (id, visits:bag{visittuple:tuple(timestamp,
>> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>>
>>
>>
>>   From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix
> for
>> dereferencing a bag within tuple does not work). Does it also addresses
>> nested bags? If so for the above example can it be dereferenced as
>> visits.visit.Key. I tried it against the latest trunk, but it failed.
>>
>>
>>
>> Thanks,
>>
>> badri
>>
>>
>>
>> Disclaimer: This message (including any attachments) is being sent from
> Fifth Generation Technologies India (P) Ltd. (5G) and may contain
> information that is proprietary, confidential and privileged. If you are not
> the intended recipient, please inform the sender immediately by reply e-mail
> and delete this message and attachments from your system, without retaining
> a copy. Any unauthorized use or dissemination of this message in whole or in
> part is strictly prohibited. 5G shall  not be liable for the improper or
> incomplete transmission of the information contained in this  communication
> nor for any delay in its receipt or damage to your system. 5G does not
> guarantee that the integrity of this communication has been maintained nor
> that this communication is free of viruses, interceptions or interference.
>
>
>
>
>
> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.
>


RE: Dereferencing columns of nested bags

Posted by Badrinarayanan S <ba...@fifthgentech.com>.
I am trying it within foreach as part of generate. 

I believe the innermost tuple of Key and Value is considered as a single
column. So I am able to refer only to $1.$1.($0) which gives the whole
tuple. However would it be possible to generate the Key and Value as
separate columns as part of foreach. 

The reference to $1.$1.($0, $1) results in an error like out of bound
access. 

Thanks,
badri 

-----Original Message-----
From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com] 
Sent: Friday, April 08, 2011 4:32 PM
To: user@pig.apache.org
Cc: Badrinarayanan S
Subject: Re: Dereferencing columns of nested bags


How are you trying to reference it ? Within foreach ? Filter ? Or 
elsewhere ?

Doesn't something like $1.$1.($0, $1) not work to reference key, value 
as a tuple ?


- Mridul


On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
> Is it possible to dereference a column part of a nested bag. In the schema
> given below, I am trying to dereference the columns Key and Value which is
> part of visit bag which is part of visits bag.
>
>
>
> (id, visits:bag{visittuple:tuple(timestamp,
> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>
>
>
>  From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix
for
> dereferencing a bag within tuple does not work). Does it also addresses
> nested bags? If so for the above example can it be dereferenced as
> visits.visit.Key. I tried it against the latest trunk, but it failed.
>
>
>
> Thanks,
>
> badri
>
>
>
> Disclaimer: This message (including any attachments) is being sent from
Fifth Generation Technologies India (P) Ltd. (5G) and may contain
information that is proprietary, confidential and privileged. If you are not
the intended recipient, please inform the sender immediately by reply e-mail
and delete this message and attachments from your system, without retaining
a copy. Any unauthorized use or dissemination of this message in whole or in
part is strictly prohibited. 5G shall  not be liable for the improper or
incomplete transmission of the information contained in this  communication
nor for any delay in its receipt or damage to your system. 5G does not
guarantee that the integrity of this communication has been maintained nor
that this communication is free of viruses, interceptions or interference.





Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.


Re: Dereferencing columns of nested bags

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
How are you trying to reference it ? Within foreach ? Filter ? Or 
elsewhere ?

Doesn't something like $1.$1.($0, $1) not work to reference key, value 
as a tuple ?


- Mridul


On Friday 08 April 2011 03:38 PM, Badrinarayanan S wrote:
> Is it possible to dereference a column part of a nested bag. In the schema
> given below, I am trying to dereference the columns Key and Value which is
> part of visit bag which is part of visits bag.
>
>
>
> (id, visits:bag{visittuple:tuple(timestamp,
> visit:bag{details:tuple(Key:chararray, Value:chararray)})})
>
>
>
>  From the SVN trunk of Pig I could see a fix for this (PIG-1866, the fix for
> dereferencing a bag within tuple does not work). Does it also addresses
> nested bags? If so for the above example can it be dereferenced as
> visits.visit.Key. I tried it against the latest trunk, but it failed.
>
>
>
> Thanks,
>
> badri
>
>
>
> Disclaimer: This message (including any attachments) is being sent from Fifth Generation Technologies India (P) Ltd. (5G) and may contain information that is proprietary, confidential and privileged. If you are not the intended recipient, please inform the sender immediately by reply e-mail and delete this message and attachments from your system, without retaining a copy. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. 5G shall  not be liable for the improper or incomplete transmission of the information contained in this  communication nor for any delay in its receipt or damage to your system. 5G does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference.