You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Josh Devins <jo...@gmail.com> on 2010/11/25 20:53:12 UTC

Moving fields to map

Hi all,

I have a a simple schema that I want to store as JSON. So I've written a
simple JsonStorage class but it requires that the tuple's first field is a
map. The problem is in converting a regular tuple into a map:

DESCRIBE thing;
> thing: {id: chararray,field1: chararray,field2: chararray}

What the map/JSON should look like:
{ 'id': 'id0', 'foo': 'valueFromField1', 'bar': 'valueFromField2' }

So this should work but seems to be invalid syntax:
jsonStore = FOREACH thing GENERATE
    [ 'id'#id, 'foo'#field1, 'bar'#field2 ] AS json:map[];

ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 150,
column 23.
Was expecting one of:
    "flatten" ...
    "(" ...
    "-" ...
    "(" ...
    "(" ...
    "(" ...
    "(" ...
    "(" ...

The only way I have this syntax working is if I use only constants in the
map:
jsonStore = FOREACH thing GENERATE
    [ 'id'#'const', 'foo'#'const', 'bar'#'const' ] AS json:map[];

Is it possible to do what I'm thinking?

Thanks,

Josh

Re: Moving fields to map

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Yep, we mostly deal with JSON only as an input format, and the first thing
we do is flatten it out.
Working with maps is cumbersome in Pig due to the casting issues, so I
prefer to avoid that when possible.
-D

On Sat, Nov 27, 2010 at 1:26 AM, Josh Devins <jo...@gmail.com> wrote:

> Thanks Dmitriy, I just needed a sanity check! I've essentially done
> the same thing as you describe, create a UDF to do the conversion but
> of course it would be nice to not have to do that. I assume that other
> people (like you and the other Twitter folks) are then working with
> JSON in Pig by reading in JSON in the first place and never building
> it in Pig as you go?
>
> I think building Maps would be a nice language feature so I'll log it
> as an issue.
>
> Cheers,
>
> Josh
>
>
>
> On 2010-11-26, at 11:39 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
> > I don't think we've considered building out Maps in Pig this way. You can
> of
> > course run your data through a UDF that would take a tuple whose first
> > argument is a list of key names, and invoke it like so:
> >
> > jsonStore = FOREACH thing GENERATE
> >  toMap('id foo bar', *) AS json:map[];
> >
> > -D
> >
> > On Thu, Nov 25, 2010 at 11:53 AM, Josh Devins <jo...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> I have a a simple schema that I want to store as JSON. So I've written a
> >> simple JsonStorage class but it requires that the tuple's first field is
> a
> >> map. The problem is in converting a regular tuple into a map:
> >>
> >> DESCRIBE thing;
> >>> thing: {id: chararray,field1: chararray,field2: chararray}
> >>
> >> What the map/JSON should look like:
> >> { 'id': 'id0', 'foo': 'valueFromField1', 'bar': 'valueFromField2' }
> >>
> >> So this should work but seems to be invalid syntax:
> >> jsonStore = FOREACH thing GENERATE
> >>   [ 'id'#id, 'foo'#field1, 'bar'#field2 ] AS json:map[];
> >>
> >> ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 150,
> >> column 23.
> >> Was expecting one of:
> >>   "flatten" ...
> >>   "(" ...
> >>   "-" ...
> >>   "(" ...
> >>   "(" ...
> >>   "(" ...
> >>   "(" ...
> >>   "(" ...
> >>
> >> The only way I have this syntax working is if I use only constants in
> the
> >> map:
> >> jsonStore = FOREACH thing GENERATE
> >>   [ 'id'#'const', 'foo'#'const', 'bar'#'const' ] AS json:map[];
> >>
> >> Is it possible to do what I'm thinking?
> >>
> >> Thanks,
> >>
> >> Josh
> >>
>

Re: Moving fields to map

Posted by Josh Devins <jo...@gmail.com>.
Thanks Dmitriy, I just needed a sanity check! I've essentially done
the same thing as you describe, create a UDF to do the conversion but
of course it would be nice to not have to do that. I assume that other
people (like you and the other Twitter folks) are then working with
JSON in Pig by reading in JSON in the first place and never building
it in Pig as you go?

I think building Maps would be a nice language feature so I'll log it
as an issue.

Cheers,

Josh



On 2010-11-26, at 11:39 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> I don't think we've considered building out Maps in Pig this way. You can of
> course run your data through a UDF that would take a tuple whose first
> argument is a list of key names, and invoke it like so:
>
> jsonStore = FOREACH thing GENERATE
>  toMap('id foo bar', *) AS json:map[];
>
> -D
>
> On Thu, Nov 25, 2010 at 11:53 AM, Josh Devins <jo...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a a simple schema that I want to store as JSON. So I've written a
>> simple JsonStorage class but it requires that the tuple's first field is a
>> map. The problem is in converting a regular tuple into a map:
>>
>> DESCRIBE thing;
>>> thing: {id: chararray,field1: chararray,field2: chararray}
>>
>> What the map/JSON should look like:
>> { 'id': 'id0', 'foo': 'valueFromField1', 'bar': 'valueFromField2' }
>>
>> So this should work but seems to be invalid syntax:
>> jsonStore = FOREACH thing GENERATE
>>   [ 'id'#id, 'foo'#field1, 'bar'#field2 ] AS json:map[];
>>
>> ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 150,
>> column 23.
>> Was expecting one of:
>>   "flatten" ...
>>   "(" ...
>>   "-" ...
>>   "(" ...
>>   "(" ...
>>   "(" ...
>>   "(" ...
>>   "(" ...
>>
>> The only way I have this syntax working is if I use only constants in the
>> map:
>> jsonStore = FOREACH thing GENERATE
>>   [ 'id'#'const', 'foo'#'const', 'bar'#'const' ] AS json:map[];
>>
>> Is it possible to do what I'm thinking?
>>
>> Thanks,
>>
>> Josh
>>

Re: Moving fields to map

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I don't think we've considered building out Maps in Pig this way. You can of
course run your data through a UDF that would take a tuple whose first
argument is a list of key names, and invoke it like so:

jsonStore = FOREACH thing GENERATE
  toMap('id foo bar', *) AS json:map[];

-D

On Thu, Nov 25, 2010 at 11:53 AM, Josh Devins <jo...@gmail.com> wrote:

> Hi all,
>
> I have a a simple schema that I want to store as JSON. So I've written a
> simple JsonStorage class but it requires that the tuple's first field is a
> map. The problem is in converting a regular tuple into a map:
>
> DESCRIBE thing;
> > thing: {id: chararray,field1: chararray,field2: chararray}
>
> What the map/JSON should look like:
> { 'id': 'id0', 'foo': 'valueFromField1', 'bar': 'valueFromField2' }
>
> So this should work but seems to be invalid syntax:
> jsonStore = FOREACH thing GENERATE
>    [ 'id'#id, 'foo'#field1, 'bar'#field2 ] AS json:map[];
>
> ERROR 1000: Error during parsing. Encountered " "[" "[ "" at line 150,
> column 23.
> Was expecting one of:
>    "flatten" ...
>    "(" ...
>    "-" ...
>    "(" ...
>    "(" ...
>    "(" ...
>    "(" ...
>    "(" ...
>
> The only way I have this syntax working is if I use only constants in the
> map:
> jsonStore = FOREACH thing GENERATE
>    [ 'id'#'const', 'foo'#'const', 'bar'#'const' ] AS json:map[];
>
> Is it possible to do what I'm thinking?
>
> Thanks,
>
> Josh
>