You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rui Martins <ru...@ruibm.com> on 2013/10/14 14:16:54 UTC

Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Hi hive users,

I am writing a custom SerDe that loads any protocol buffer generated class.
For flexibility this class can live in a jar external to the SerDe's jar
and then I just use the Hive Configuration class passed in the initiliaze
to dynamically load it and set the schema for the Hive table.

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer

When I use my custom SerDe as a Deserializer it all works well, I get a
Configuration and I correctly load the ProtoBuffer class from the external
Jar.

However, when I use the SerDe as a Serializer, the Configuration is always
set to null so I have no way of loading the external class from the Jar.

My questions are:

*  1) Is the initialize(..) method in Serializer supposed to always pass a
null Configuration?*
*
*
*  2) Is there a way of creating or retrieving the current Hadoop/Hive
Configuration when this parameter is passed as null?*
*
*

Thank you,
rui

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Hari Subramaniyan <hs...@hortonworks.com>.
Please see
https://svn.apache.org/repos/asf/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
https://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/
to see how a non-null conf is passed to initialize()

Thanks
Hari


On Mon, Oct 14, 2013 at 6:29 PM, Yin Huai <hu...@gmail.com> wrote:

> Can you try to set serde properties?
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties
>
> I have not tried it, but seems it is the right way to pass configurations
> to serde class.
>
> Thanks,
>
> Yin
>
>
> On Mon, Oct 14, 2013 at 8:20 AM, Rui Martins <ru...@ruibm.com> wrote:
>
> > +dev hive mailing list that I should've mailed in the first place.
> >
> > (apologies for the spam)
> >
> >
> > On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:
> >
> >> Hi hive users,
> >>
> >> I am writing a custom SerDe that loads any protocol buffer generated
> >> class.
> >> For flexibility this class can live in a jar external to the SerDe's jar
> >> and then I just use the Hive Configuration class passed in the
> initiliaze
> >> to dynamically load it and set the schema for the Hive table.
> >>
> >>
> >>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
> >>
> >> When I use my custom SerDe as a Deserializer it all works well, I get a
> >> Configuration and I correctly load the ProtoBuffer class from the
> external
> >> Jar.
> >>
> >> However, when I use the SerDe as a Serializer, the Configuration is
> >> always set to null so I have no way of loading the external class from
> the
> >> Jar.
> >>
> >> My questions are:
> >>
> >> *  1) Is the initialize(..) method in Serializer supposed to always pass
> >> a null Configuration?*
> >> *
> >> *
> >> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
> >> Configuration when this parameter is passed as null?*
> >> *
> >> *
> >>
> >> Thank you,
> >> rui
> >>
> >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Hari Subramaniyan <hs...@hortonworks.com>.
Please see
https://svn.apache.org/repos/asf/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
https://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/
to see how a non-null conf is passed to initialize()

Thanks
Hari


On Mon, Oct 14, 2013 at 6:29 PM, Yin Huai <hu...@gmail.com> wrote:

> Can you try to set serde properties?
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties
>
> I have not tried it, but seems it is the right way to pass configurations
> to serde class.
>
> Thanks,
>
> Yin
>
>
> On Mon, Oct 14, 2013 at 8:20 AM, Rui Martins <ru...@ruibm.com> wrote:
>
> > +dev hive mailing list that I should've mailed in the first place.
> >
> > (apologies for the spam)
> >
> >
> > On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:
> >
> >> Hi hive users,
> >>
> >> I am writing a custom SerDe that loads any protocol buffer generated
> >> class.
> >> For flexibility this class can live in a jar external to the SerDe's jar
> >> and then I just use the Hive Configuration class passed in the
> initiliaze
> >> to dynamically load it and set the schema for the Hive table.
> >>
> >>
> >>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
> >>
> >> When I use my custom SerDe as a Deserializer it all works well, I get a
> >> Configuration and I correctly load the ProtoBuffer class from the
> external
> >> Jar.
> >>
> >> However, when I use the SerDe as a Serializer, the Configuration is
> >> always set to null so I have no way of loading the external class from
> the
> >> Jar.
> >>
> >> My questions are:
> >>
> >> *  1) Is the initialize(..) method in Serializer supposed to always pass
> >> a null Configuration?*
> >> *
> >> *
> >> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
> >> Configuration when this parameter is passed as null?*
> >> *
> >> *
> >>
> >> Thank you,
> >> rui
> >>
> >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Yin Huai <hu...@gmail.com>.
Can you try to set serde properties?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties

I have not tried it, but seems it is the right way to pass configurations
to serde class.

Thanks,

Yin


On Mon, Oct 14, 2013 at 8:20 AM, Rui Martins <ru...@ruibm.com> wrote:

> +dev hive mailing list that I should've mailed in the first place.
>
> (apologies for the spam)
>
>
> On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:
>
>> Hi hive users,
>>
>> I am writing a custom SerDe that loads any protocol buffer generated
>> class.
>> For flexibility this class can live in a jar external to the SerDe's jar
>> and then I just use the Hive Configuration class passed in the initiliaze
>> to dynamically load it and set the schema for the Hive table.
>>
>>
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>>
>> When I use my custom SerDe as a Deserializer it all works well, I get a
>> Configuration and I correctly load the ProtoBuffer class from the external
>> Jar.
>>
>> However, when I use the SerDe as a Serializer, the Configuration is
>> always set to null so I have no way of loading the external class from the
>> Jar.
>>
>> My questions are:
>>
>> *  1) Is the initialize(..) method in Serializer supposed to always pass
>> a null Configuration?*
>> *
>> *
>> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
>> Configuration when this parameter is passed as null?*
>> *
>> *
>>
>> Thank you,
>> rui
>>
>
>

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Yin Huai <hu...@gmail.com>.
Can you try to set serde properties?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddSerDeProperties

I have not tried it, but seems it is the right way to pass configurations
to serde class.

Thanks,

Yin


On Mon, Oct 14, 2013 at 8:20 AM, Rui Martins <ru...@ruibm.com> wrote:

> +dev hive mailing list that I should've mailed in the first place.
>
> (apologies for the spam)
>
>
> On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:
>
>> Hi hive users,
>>
>> I am writing a custom SerDe that loads any protocol buffer generated
>> class.
>> For flexibility this class can live in a jar external to the SerDe's jar
>> and then I just use the Hive Configuration class passed in the initiliaze
>> to dynamically load it and set the schema for the Hive table.
>>
>>
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>>
>> When I use my custom SerDe as a Deserializer it all works well, I get a
>> Configuration and I correctly load the ProtoBuffer class from the external
>> Jar.
>>
>> However, when I use the SerDe as a Serializer, the Configuration is
>> always set to null so I have no way of loading the external class from the
>> Jar.
>>
>> My questions are:
>>
>> *  1) Is the initialize(..) method in Serializer supposed to always pass
>> a null Configuration?*
>> *
>> *
>> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
>> Configuration when this parameter is passed as null?*
>> *
>> *
>>
>> Thank you,
>> rui
>>
>
>

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Rui Martins <ru...@ruibm.com>.
+dev hive mailing list that I should've mailed in the first place.

(apologies for the spam)


On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:

> Hi hive users,
>
> I am writing a custom SerDe that loads any protocol buffer generated
> class.
> For flexibility this class can live in a jar external to the SerDe's jar
> and then I just use the Hive Configuration class passed in the initiliaze
> to dynamically load it and set the schema for the Hive table.
>
>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>
> When I use my custom SerDe as a Deserializer it all works well, I get a
> Configuration and I correctly load the ProtoBuffer class from the external
> Jar.
>
> However, when I use the SerDe as a Serializer, the Configuration is always
> set to null so I have no way of loading the external class from the Jar.
>
> My questions are:
>
> *  1) Is the initialize(..) method in Serializer supposed to always pass
> a null Configuration?*
> *
> *
> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
> Configuration when this parameter is passed as null?*
> *
> *
>
> Thank you,
> rui
>

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Rui Martins <ru...@ruibm.com>.
+dev hive mailing list that I should've mailed in the first place.

(apologies for the spam)


On Mon, Oct 14, 2013 at 1:16 PM, Rui Martins <ru...@ruibm.com> wrote:

> Hi hive users,
>
> I am writing a custom SerDe that loads any protocol buffer generated
> class.
> For flexibility this class can live in a jar external to the SerDe's jar
> and then I just use the Hive Configuration class passed in the initiliaze
> to dynamically load it and set the schema for the Hive table.
>
>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>
> When I use my custom SerDe as a Deserializer it all works well, I get a
> Configuration and I correctly load the ProtoBuffer class from the external
> Jar.
>
> However, when I use the SerDe as a Serializer, the Configuration is always
> set to null so I have no way of loading the external class from the Jar.
>
> My questions are:
>
> *  1) Is the initialize(..) method in Serializer supposed to always pass
> a null Configuration?*
> *
> *
> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
> Configuration when this parameter is passed as null?*
> *
> *
>
> Thank you,
> rui
>

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Rui Martins <ru...@ruibm.com>.
Thank you all for the lightning fast replies.

@Yin: I am actually passing SerDe properties and they are working fine.
Actually, this is how I pass the the full namespace of the class I want to
use. I receive the Properties object fine, it's the Configuration one that
is null. :(

@Hari: Perfect Hari, I think this is exactly what I need.

@Edward: Thanks for the suggestion. I have seen that and I think twitter
also has one called Elephant Bird but neither supports all the
functionalities I need. I've implement support both Deserializer and
Serializer for Protocol Buffers (the former works flawlessly, the latter is
the one with the problem). In my implementation I also support other data
format types (apart from Protocol Buffers), and I also implemented new
Input and Output file formats. Hopefully I'll be able to opensource this
soon.


On Mon, Oct 14, 2013 at 6:06 PM, Edward Capriolo <ed...@gmail.com>wrote:

> Have you seen?
>
> https://github.com/edwardcapriolo/hive-protobuf/
>
>
> On Mon, Oct 14, 2013 at 8:16 AM, Rui Martins <ru...@ruibm.com> wrote:
>
>> Hi hive users,
>>
>> I am writing a custom SerDe that loads any protocol buffer generated
>> class.
>> For flexibility this class can live in a jar external to the SerDe's jar
>> and then I just use the Hive Configuration class passed in the initiliaze
>> to dynamically load it and set the schema for the Hive table.
>>
>>
>> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>>
>> When I use my custom SerDe as a Deserializer it all works well, I get a
>> Configuration and I correctly load the ProtoBuffer class from the external
>> Jar.
>>
>> However, when I use the SerDe as a Serializer, the Configuration is
>> always set to null so I have no way of loading the external class from the
>> Jar.
>>
>> My questions are:
>>
>> *  1) Is the initialize(..) method in Serializer supposed to always pass
>> a null Configuration?*
>> *
>> *
>> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
>> Configuration when this parameter is passed as null?*
>> *
>> *
>>
>> Thank you,
>> rui
>>
>
>

Re: Custom SerDe: Initialize() passes a null configuration to my Custom SerDe

Posted by Edward Capriolo <ed...@gmail.com>.
Have you seen?

https://github.com/edwardcapriolo/hive-protobuf/


On Mon, Oct 14, 2013 at 8:16 AM, Rui Martins <ru...@ruibm.com> wrote:

> Hi hive users,
>
> I am writing a custom SerDe that loads any protocol buffer generated
> class.
> For flexibility this class can live in a jar external to the SerDe's jar
> and then I just use the Hive Configuration class passed in the initiliaze
> to dynamically load it and set the schema for the Hive table.
>
>
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.0-cdh3u0/org/apache/hadoop/hive/serde2/Serializer.java#Serializer
>
> When I use my custom SerDe as a Deserializer it all works well, I get a
> Configuration and I correctly load the ProtoBuffer class from the external
> Jar.
>
> However, when I use the SerDe as a Serializer, the Configuration is always
> set to null so I have no way of loading the external class from the Jar.
>
> My questions are:
>
> *  1) Is the initialize(..) method in Serializer supposed to always pass
> a null Configuration?*
> *
> *
> *  2) Is there a way of creating or retrieving the current Hadoop/Hive
> Configuration when this parameter is passed as null?*
> *
> *
>
> Thank you,
> rui
>