You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Leo Alekseyev <dn...@gmail.com> on 2010/07/13 02:08:45 UTC

Hive and protocol buffers -- are there UDFs for dealing with them?

Hi all,
I was wondering if anyone is using Hive with protocol buffers.  The
Hadoop wiki links to
http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
for SerDe examples; there it says that there is no built-in support
for protobufs.  Since this presentation is about a year old, I was
wondering whether there appeared any UDFs, native or third-party, to
deal with them.

I am also curious about the relative efficiency of performing SerDe
using UDFs in hive vs. running a separate hadoop job to first
deserialize the data from protocol buffers into an ascii flat file
with only the "interesting" fields (going from ~15 fields to ~3), and
then doing the rest of the computation in hive.  I'd appreciate any
comments!

Thanks,
--Leo

Re: Hive and protocol buffers -- are there UDFs for dealing with them?

Posted by Yang <te...@gmail.com>.
to ur latter question, I guess that depends on whether u have a
column-wise storage, the column-wise storage might need some specific
SerDe.

On Mon, Jul 12, 2010 at 5:08 PM, Leo Alekseyev <dn...@gmail.com> wrote:
> Hi all,
> I was wondering if anyone is using Hive with protocol buffers.  The
> Hadoop wiki links to
> http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
> for SerDe examples; there it says that there is no built-in support
> for protobufs.  Since this presentation is about a year old, I was
> wondering whether there appeared any UDFs, native or third-party, to
> deal with them.
>
> I am also curious about the relative efficiency of performing SerDe
> using UDFs in hive vs. running a separate hadoop job to first
> deserialize the data from protocol buffers into an ascii flat file
> with only the "interesting" fields (going from ~15 fields to ~3), and
> then doing the rest of the computation in hive.  I'd appreciate any
> comments!
>
> Thanks,
> --Leo
>

Re: Hive and protocol buffers -- are there UDFs for dealing with them?

Posted by Zheng Shao <zs...@gmail.com>.
If you just need to scan the data once, it makes sense to use hive
SerDe to read the data directly (which saves you one I/O round trip).

If you need to read the data multiple times, then it's better to save
the 3 columns into separate files.

Zheng

On Mon, Jul 12, 2010 at 5:08 PM, Leo Alekseyev <dn...@gmail.com> wrote:
> Hi all,
> I was wondering if anyone is using Hive with protocol buffers.  The
> Hadoop wiki links to
> http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
> for SerDe examples; there it says that there is no built-in support
> for protobufs.  Since this presentation is about a year old, I was
> wondering whether there appeared any UDFs, native or third-party, to
> deal with them.
>
> I am also curious about the relative efficiency of performing SerDe
> using UDFs in hive vs. running a separate hadoop job to first
> deserialize the data from protocol buffers into an ascii flat file
> with only the "interesting" fields (going from ~15 fields to ~3), and
> then doing the rest of the computation in hive.  I'd appreciate any
> comments!
>
> Thanks,
> --Leo
>



-- 
Yours,
Zheng
http://www.linkedin.com/in/zshao