You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2011/04/20 16:07:03 UTC
How to remove the field key from bags tuples after a GROUP ?
Hi,
First, I group 2 tables using a key (named sid):
rich_sessions = GROUP sessions BY sid, activities BY sid;
After this operation, all the tuples in the bag "activities" start
with the same "sid" field.
This field is long (64 bytes) and I would like to remove it from all
activity tuples in order to save space before storing this
rich_sessions in a file.
Is there any way to do this ?
Thank for your help,
Re: How to remove the field key from bags tuples after a GROUP ?
Posted by Vincent Barat <vb...@ubikod.com>.
I will try this, it seems to be what I was looking for.
Thanks !
Le 20/04/11 18:12, Sven Krasser a écrit :
> Sounds like the "Nested Projection" example in
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
> looking for.
> -Sven
>
> On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat<vi...@gmail.com>wrote:
>
>> Hi,
>>
>> First, I group 2 tables using a key (named sid):
>>
>> rich_sessions = GROUP sessions BY sid, activities BY sid;
>>
>> After this operation, all the tuples in the bag "activities" start with the
>> same "sid" field.
>> This field is long (64 bytes) and I would like to remove it from all
>> activity tuples in order to save space before storing this rich_sessions in
>> a file.
>>
>> Is there any way to do this ?
>>
>> Thank for your help,
>>
--
*Vincent BARAT, UBIKOD, CTO*
vbarat@ubikod.com <ma...@ubikod.com> Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89
UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
99 65 69 13
www.ubikod.com <http://www.ubikod.com/>@ubikod
<http://twitter.com/ubikod>
www.capptain.com <http://www.capptain.com/>@capptain_hq
<http://twitter.com/capptain_hq>
IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).
Re: How to remove the field key from bags tuples after a GROUP ?
Posted by Vincent Barat <vb...@ubikod.com>.
It works ! Thanks a lot.
Le 20/04/11 18:12, Sven Krasser a écrit :
> Sounds like the "Nested Projection" example in
> http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
> looking for.
> -Sven
>
> On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat<vi...@gmail.com>wrote:
>
>> Hi,
>>
>> First, I group 2 tables using a key (named sid):
>>
>> rich_sessions = GROUP sessions BY sid, activities BY sid;
>>
>> After this operation, all the tuples in the bag "activities" start with the
>> same "sid" field.
>> This field is long (64 bytes) and I would like to remove it from all
>> activity tuples in order to save space before storing this rich_sessions in
>> a file.
>>
>> Is there any way to do this ?
>>
>> Thank for your help,
>>
--
*Vincent BARAT, UBIKOD, CTO*
vbarat@ubikod.com <ma...@ubikod.com> Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89
UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
99 65 69 13
www.ubikod.com <http://www.ubikod.com/>@ubikod
<http://twitter.com/ubikod>
www.capptain.com <http://www.capptain.com/>@capptain_hq
<http://twitter.com/capptain_hq>
IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).
Re: How to remove the field key from bags tuples after a GROUP ?
Posted by Sven Krasser <kr...@gmail.com>.
Sounds like the "Nested Projection" example in
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
looking for.
-Sven
On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat <vi...@gmail.com>wrote:
> Hi,
>
> First, I group 2 tables using a key (named sid):
>
> rich_sessions = GROUP sessions BY sid, activities BY sid;
>
> After this operation, all the tuples in the bag "activities" start with the
> same "sid" field.
> This field is long (64 bytes) and I would like to remove it from all
> activity tuples in order to save space before storing this rich_sessions in
> a file.
>
> Is there any way to do this ?
>
> Thank for your help,
>
Re: How to remove the field key from bags tuples after a GROUP ?
Posted by sumit ghosh <su...@yahoo.com>.
Hi,
Did you get a chance to look into the PiggyBank String functions?
http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/evaluation/string/package-summary.html
I guess you need to use the substring function.
REGISTER <path-to-piggybank>/piggybank.jar;
DEFINE StrSub org.apache.pig.piggybank.evaluation.string.SUBSTRING();
... now you can use the SUBSTRING function as StrSub.
B = ForEach A generate StrSub(sid,1,64);
Hope it Helps.
Sumit
________________________________
From: Vincent Barat <vi...@gmail.com>
To: "pig-user@hadoop.apache.org" <pi...@hadoop.apache.org>
Sent: Wed, 20 April, 2011 7:37:03 PM
Subject: How to remove the field key from bags tuples after a GROUP ?
Hi,
First, I group 2 tables using a key (named sid):
rich_sessions = GROUP sessions BY sid, activities BY sid;
After this operation, all the tuples in the bag "activities" start with the same
"sid" field.
This field is long (64 bytes) and I would like to remove it from all activity
tuples in order to save space before storing this rich_sessions in a file.
Is there any way to do this ?
Thank for your help,