You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Shawn Hermans <sh...@gmail.com> on 2013/03/12 19:17:22 UTC
Read Hive LazySimpleSerde with Pig
All,
Is there an easy way to read Hive LazySimpleSerde encoded files in Pig? I
did some research and found support for Hive's columnar format and for
SequenceFiles, but did not see anything for LazySimpleSerde.
Thanks,
Shawn
Re: Read Hive LazySimpleSerde with Pig
Posted by Shawn Hermans <sh...@gmail.com>.
Solved the issue with a Jython UDF.
REGISTER 'lazysimpleserde.py' USING jython AS myfuncs;
A = LOAD '000000_0' using PigStorage('\\u001') AS (params:chararray);
B = FOREACH pixels GENERATE myfuncs.extractMap(params);
@outputSchema("params:map[]")
def extractMap(lazy_map):
extracted = {}
entries = lazy_map.split('\x02')
for entry in entries:
split_entry = entry.split('\x03')
if len(split_entry) == 2:
extracted[split_entry[0]] = split_entry[1]
return extracted
On Tue, Mar 12, 2013 at 4:35 PM, Shawn Hermans <sh...@gmail.com>wrote:
> It uses ^A for record separator. That would be easy enough as I could
> just use PigStorage("\001") to pull in the records. The only issue is how
> to extract maps. It uses ^C to separate entires within the map and ^B to
> separate key/value pairs in the map. It wouldn't be too difficult to write
> a UDF to parse the map entries, I was just wondering if there was a
> built-in way of doing that.
>
> Thanks,
> Shawn
>
>
> On Tue, Mar 12, 2013 at 2:53 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> How does LazySimpleSerde store data?
>>
>>
>> On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans <shawnhermans@gmail.com
>> >wrote:
>>
>> > All,
>> > Is there an easy way to read Hive LazySimpleSerde encoded files in Pig?
>> I
>> > did some research and found support for Hive's columnar format and for
>> > SequenceFiles, but did not see anything for LazySimpleSerde.
>> >
>> > Thanks,
>> > Shawn
>> >
>>
>
>
Re: Read Hive LazySimpleSerde with Pig
Posted by Shawn Hermans <sh...@gmail.com>.
It uses ^A for record separator. That would be easy enough as I could just
use PigStorage("\001") to pull in the records. The only issue is how to
extract maps. It uses ^C to separate entires within the map and ^B to
separate key/value pairs in the map. It wouldn't be too difficult to write
a UDF to parse the map entries, I was just wondering if there was a
built-in way of doing that.
Thanks,
Shawn
On Tue, Mar 12, 2013 at 2:53 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> How does LazySimpleSerde store data?
>
>
> On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans <shawnhermans@gmail.com
> >wrote:
>
> > All,
> > Is there an easy way to read Hive LazySimpleSerde encoded files in Pig?
> I
> > did some research and found support for Hive's columnar format and for
> > SequenceFiles, but did not see anything for LazySimpleSerde.
> >
> > Thanks,
> > Shawn
> >
>
Re: Read Hive LazySimpleSerde with Pig
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
How does LazySimpleSerde store data?
On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans <sh...@gmail.com>wrote:
> All,
> Is there an easy way to read Hive LazySimpleSerde encoded files in Pig? I
> did some research and found support for Hive's columnar format and for
> SequenceFiles, but did not see anything for LazySimpleSerde.
>
> Thanks,
> Shawn
>