You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Koert Kuipers <ko...@tresata.com> on 2012/02/21 16:37:19 UTC

2 questions about SerDe

1) Is there a way in initialize() of a SerDe to know if it is being used as
a Serializer or a Deserializer. If not, can i define the Serializer and
Deserializer separately instead of defining a SerDe (so i have two
initialize methods)?

2) Is there a way to find out which columns are being used? say if someone
does select a,b,c from test, and my SerDe gets initialized for usage in
that query how can i know that only a,b,c are being needed? i would like to
take advantage of this information so i dont deserialize unnecessary
information, without having to resort to more complex lazy deserialization
tactics.

Re: 2 questions about SerDe

Posted by Roberto Congiu <ro...@openx.com>.
Have a look at the code for the LazySerDes. When you deserialize in the
SerDe, you don't actually have to deserialize all the columns. Deserialized
could return an object that is not actually deserialized and you can write
an ObjectInspector that deserializes a field from that structure but only
when it's needed (that's when the ObjectInspector is called).

R.

On Tue, Feb 21, 2012 at 7:37 AM, Koert Kuipers <ko...@tresata.com> wrote:

> 1) Is there a way in initialize() of a SerDe to know if it is being used
> as a Serializer or a Deserializer. If not, can i define the Serializer and
> Deserializer separately instead of defining a SerDe (so i have two
> initialize methods)?
>
> 2) Is there a way to find out which columns are being used? say if someone
> does select a,b,c from test, and my SerDe gets initialized for usage in
> that query how can i know that only a,b,c are being needed? i would like to
> take advantage of this information so i dont deserialize unnecessary
> information, without having to resort to more complex lazy deserialization
> tactics.
>