You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Chan, Tim" <tc...@edmunds.com> on 2012/12/08 19:48:13 UTC

Simplifying schema?

After many joins, my relation's schema because very verbose.

For example:

e::d::c::b::a::column1:bytearray, e::d::c::b::a::column2:bytearray

Is there a way simplify the schema back to:

column1:bytearray, column2:bytearray

I seem to be able to achieve this by doing a STORE then LOAD, but this
doesn't seem very efficient.

Re: Simplifying schema?

Posted by Jesse Jackson <je...@gmail.com>.
Good Afternoon Tim,
  What I've done is in my next command is something like this:

NewBag = foreach oldBag generate e::d::c::b::a::column1 as column1,
e::d::c::b::a::column2 as column2;

then your down to more manageable names.

-JJ

On Sat, Dec 8, 2012 at 1:48 PM, Chan, Tim <tc...@edmunds.com> wrote:
> After many joins, my relation's schema because very verbose.
>
> For example:
>
> e::d::c::b::a::column1:bytearray, e::d::c::b::a::column2:bytearray
>
> Is there a way simplify the schema back to:
>
> column1:bytearray, column2:bytearray
>
> I seem to be able to achieve this by doing a STORE then LOAD, but this
> doesn't seem very efficient.

Re: Simplifying schema?

Posted by Aaron Zimmerman <az...@sproutsocial.com>.
You still can reference the columns by their name (column1, column2), the
only time you'd need to use the fully qualified name is if you have
duplicated column names.  In this case, they would have different
qualifier strings (as they came from differently named relations).   That
could happen if you load from more than one data source, or if you do a
self join.  


On 12/10/12 7:47 AM, "Lauren Blau" <la...@digitalreasoning.com>
wrote:

>yeah, this is really annoying. I'd love to see an option to
>automatically strip these 'parentage' values for unique names.
>
>
>On Sat, Dec 8, 2012 at 1:48 PM, Chan, Tim <tc...@edmunds.com> wrote:
>> After many joins, my relation's schema because very verbose.
>>
>> For example:
>>
>> e::d::c::b::a::column1:bytearray, e::d::c::b::a::column2:bytearray
>>
>> Is there a way simplify the schema back to:
>>
>> column1:bytearray, column2:bytearray
>>
>> I seem to be able to achieve this by doing a STORE then LOAD, but this
>> doesn't seem very efficient.



Re: Simplifying schema?

Posted by Lauren Blau <la...@digitalreasoning.com>.
yeah, this is really annoying. I'd love to see an option to
automatically strip these 'parentage' values for unique names.


On Sat, Dec 8, 2012 at 1:48 PM, Chan, Tim <tc...@edmunds.com> wrote:
> After many joins, my relation's schema because very verbose.
>
> For example:
>
> e::d::c::b::a::column1:bytearray, e::d::c::b::a::column2:bytearray
>
> Is there a way simplify the schema back to:
>
> column1:bytearray, column2:bytearray
>
> I seem to be able to achieve this by doing a STORE then LOAD, but this
> doesn't seem very efficient.