You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2011/04/20 16:07:03 UTC

How to remove the field key from bags tuples after a GROUP ?

Hi,

First, I group 2 tables using a key (named sid):

rich_sessions = GROUP sessions BY sid, activities BY sid;

After this operation, all the tuples in the bag "activities" start 
with the same "sid" field.
This field is long (64 bytes) and I would like to remove it from all 
activity tuples in order to save space before storing this 
rich_sessions in a file.

Is there any way to do this ?

Thank for your help,

Re: How to remove the field key from bags tuples after a GROUP ?

Posted by Vincent Barat <vb...@ubikod.com>.
I will try this, it seems to be what I was looking for.
Thanks !

Le 20/04/11 18:12, Sven Krasser a écrit :
> Sounds like the "Nested Projection" example in
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
> looking for.
> -Sven
>
> On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat<vi...@gmail.com>wrote:
>
>> Hi,
>>
>> First, I group 2 tables using a key (named sid):
>>
>> rich_sessions = GROUP sessions BY sid, activities BY sid;
>>
>> After this operation, all the tuples in the bag "activities" start with the
>> same "sid" field.
>> This field is long (64 bytes) and I would like to remove it from all
>> activity tuples in order to save space before storing this rich_sessions in
>> a file.
>>
>> Is there any way to do this ?
>>
>> Thank for your help,
>>

-- 

*Vincent BARAT, UBIKOD, CTO*


vbarat@ubikod.com <ma...@ubikod.com>  Mob +33 (0)6 15 41 15 18

UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 
99 65 69 13


www.ubikod.com <http://www.ubikod.com/>@ubikod 
<http://twitter.com/ubikod>

www.capptain.com <http://www.capptain.com/>@capptain_hq 
<http://twitter.com/capptain_hq>


IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of 
UBIKOD S.A.R.L., all copyrights are reserved.  The contents of this 
email and attachments are confidential and may be subject to legal 
privilege and/or protected by copyright. Copying or communicating 
any part of it to others is prohibited and may be unlawful. If you 
are not the intended recipient you must not use, copy, distribute or 
rely on this email and should please return it immediately or notify 
us by telephone. At present the integrity of email across the 
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not 
accept liability for any claims arising as a result of the use of 
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD 
S.A.R.L. may exercise any of its rights under relevant law, to 
monitor the content of all electronic communications. You should 
therefore be aware that this communication and any responses might 
have been monitored, and may be accessed by UBIKOD S.A.R.L. The 
views expressed in this document are that of the individual and may 
not necessarily constitute or imply its endorsement or 
recommendation by UBIKOD S.A.R.L. The content of this electronic 
mail may be subject to the confidentiality terms of a 
"Non-Disclosure Agreement" (NDA).


Re: How to remove the field key from bags tuples after a GROUP ?

Posted by Vincent Barat <vb...@ubikod.com>.
It works ! Thanks a lot.

Le 20/04/11 18:12, Sven Krasser a écrit :
> Sounds like the "Nested Projection" example in
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
> looking for.
> -Sven
>
> On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat<vi...@gmail.com>wrote:
>
>> Hi,
>>
>> First, I group 2 tables using a key (named sid):
>>
>> rich_sessions = GROUP sessions BY sid, activities BY sid;
>>
>> After this operation, all the tuples in the bag "activities" start with the
>> same "sid" field.
>> This field is long (64 bytes) and I would like to remove it from all
>> activity tuples in order to save space before storing this rich_sessions in
>> a file.
>>
>> Is there any way to do this ?
>>
>> Thank for your help,
>>

-- 

*Vincent BARAT, UBIKOD, CTO*


vbarat@ubikod.com <ma...@ubikod.com>  Mob +33 (0)6 15 41 15 18

UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 
99 65 69 13


www.ubikod.com <http://www.ubikod.com/>@ubikod 
<http://twitter.com/ubikod>

www.capptain.com <http://www.capptain.com/>@capptain_hq 
<http://twitter.com/capptain_hq>


IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of 
UBIKOD S.A.R.L., all copyrights are reserved.  The contents of this 
email and attachments are confidential and may be subject to legal 
privilege and/or protected by copyright. Copying or communicating 
any part of it to others is prohibited and may be unlawful. If you 
are not the intended recipient you must not use, copy, distribute or 
rely on this email and should please return it immediately or notify 
us by telephone. At present the integrity of email across the 
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not 
accept liability for any claims arising as a result of the use of 
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD 
S.A.R.L. may exercise any of its rights under relevant law, to 
monitor the content of all electronic communications. You should 
therefore be aware that this communication and any responses might 
have been monitored, and may be accessed by UBIKOD S.A.R.L. The 
views expressed in this document are that of the individual and may 
not necessarily constitute or imply its endorsement or 
recommendation by UBIKOD S.A.R.L. The content of this electronic 
mail may be subject to the confidentiality terms of a 
"Non-Disclosure Agreement" (NDA).


Re: How to remove the field key from bags tuples after a GROUP ?

Posted by Sven Krasser <kr...@gmail.com>.
Sounds like the "Nested Projection" example in
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
looking for.
-Sven

On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat <vi...@gmail.com>wrote:

> Hi,
>
> First, I group 2 tables using a key (named sid):
>
> rich_sessions = GROUP sessions BY sid, activities BY sid;
>
> After this operation, all the tuples in the bag "activities" start with the
> same "sid" field.
> This field is long (64 bytes) and I would like to remove it from all
> activity tuples in order to save space before storing this rich_sessions in
> a file.
>
> Is there any way to do this ?
>
> Thank for your help,
>

Re: How to remove the field key from bags tuples after a GROUP ?

Posted by sumit ghosh <su...@yahoo.com>.
Hi,

Did you get a chance to look into the PiggyBank String functions?

http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/evaluation/string/package-summary.html

I guess you need to use the substring function.

REGISTER <path-to-piggybank>/piggybank.jar;
DEFINE StrSub org.apache.pig.piggybank.evaluation.string.SUBSTRING();

... now you can use the SUBSTRING function as StrSub.
B = ForEach A generate StrSub(sid,1,64);

Hope it Helps.
Sumit



________________________________
From: Vincent Barat <vi...@gmail.com>
To: "pig-user@hadoop.apache.org" <pi...@hadoop.apache.org>
Sent: Wed, 20 April, 2011 7:37:03 PM
Subject: How to remove the field key from bags tuples after a GROUP ?

Hi,

First, I group 2 tables using a key (named sid):

rich_sessions = GROUP sessions BY sid, activities BY sid;

After this operation, all the tuples in the bag "activities" start with the same 
"sid" field.
This field is long (64 bytes) and I would like to remove it from all activity 
tuples in order to save space before storing this rich_sessions in a file.

Is there any way to do this ?

Thank for your help,