You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Johan Oskarsson <jo...@oskarsson.nu> on 2008/12/08 19:16:16 UTC

Serde and Record I/O

We store a lot of data in SequenceFiles with the key and value as
generated Jute/RecordIO files and would want to process it all using Hive.

I noticed that there is a serde/jute package, but I assume serde version
1 is deprecated in favour of serde2? Either way I get a class cast
exception if I try to use it.

I've looked through the mailinglist and wiki but can't find a good
example on how to process sequencefiles with recordio key/value classes.
Any help would be much appreciated.

/Johan

RE: Serde and Record I/O

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Sorry - I think Ashish is working to get the patches in. I will take a look at 126 - we are missing Prasad who wrote most of the current metastore code.

-----Original Message-----
From: Johan Oskarsson [mailto:johan@oskarsson.nu] 
Sent: Monday, December 08, 2008 10:53 AM
To: hive-user@hadoop.apache.org
Subject: Re: Serde and Record I/O

Thanks for the quick reply.

I have opened a ticket as suggested:
https://issues.apache.org/jira/browse/HIVE-133

While I have you attention I would be very grateful if someone could
take a few minutes and commit the following patches, they're pretty
small and have already been reviewed:
https://issues.apache.org/jira/browse/HIVE-90
https://issues.apache.org/jira/browse/HIVE-114
https://issues.apache.org/jira/browse/HIVE-116
https://issues.apache.org/jira/browse/HIVE-101 (not a patch)

And I need some advice on this one:
https://issues.apache.org/jira/browse/HIVE-126

Thanks in advance!

/Johan

Joydeep Sen Sarma wrote:
> Hi Johan - so keys and value class types are RecordIO classes?
> 
> This may need some dev work. A few things:
> - traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
> - the jute code was written for an older version of the serde interface and needs to be ported to the new interface
> - finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)
> 
> +1 on supporting this - please file a Jira - should be very easy to get this in.
> 
> -----Original Message-----
> From: Johan Oskarsson [mailto:johan@oskarsson.nu] 
> Sent: Monday, December 08, 2008 10:16 AM
> To: hive-user@hadoop.apache.org
> Subject: Serde and Record I/O
> 
> We store a lot of data in SequenceFiles with the key and value as
> generated Jute/RecordIO files and would want to process it all using Hive.
> 
> I noticed that there is a serde/jute package, but I assume serde version
> 1 is deprecated in favour of serde2? Either way I get a class cast
> exception if I try to use it.
> 
> I've looked through the mailinglist and wiki but can't find a good
> example on how to process sequencefiles with recordio key/value classes.
> Any help would be much appreciated.
> 
> /Johan


Re: Serde and Record I/O

Posted by Johan Oskarsson <jo...@oskarsson.nu>.
Thanks for the quick reply.

I have opened a ticket as suggested:
https://issues.apache.org/jira/browse/HIVE-133

While I have you attention I would be very grateful if someone could
take a few minutes and commit the following patches, they're pretty
small and have already been reviewed:
https://issues.apache.org/jira/browse/HIVE-90
https://issues.apache.org/jira/browse/HIVE-114
https://issues.apache.org/jira/browse/HIVE-116
https://issues.apache.org/jira/browse/HIVE-101 (not a patch)

And I need some advice on this one:
https://issues.apache.org/jira/browse/HIVE-126

Thanks in advance!

/Johan

Joydeep Sen Sarma wrote:
> Hi Johan - so keys and value class types are RecordIO classes?
> 
> This may need some dev work. A few things:
> - traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
> - the jute code was written for an older version of the serde interface and needs to be ported to the new interface
> - finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)
> 
> +1 on supporting this - please file a Jira - should be very easy to get this in.
> 
> -----Original Message-----
> From: Johan Oskarsson [mailto:johan@oskarsson.nu] 
> Sent: Monday, December 08, 2008 10:16 AM
> To: hive-user@hadoop.apache.org
> Subject: Serde and Record I/O
> 
> We store a lot of data in SequenceFiles with the key and value as
> generated Jute/RecordIO files and would want to process it all using Hive.
> 
> I noticed that there is a serde/jute package, but I assume serde version
> 1 is deprecated in favour of serde2? Either way I get a class cast
> exception if I try to use it.
> 
> I've looked through the mailinglist and wiki but can't find a good
> example on how to process sequencefiles with recordio key/value classes.
> Any help would be much appreciated.
> 
> /Johan


RE: Serde and Record I/O

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Hi Johan - so keys and value class types are RecordIO classes?

This may need some dev work. A few things:
- traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
- the jute code was written for an older version of the serde interface and needs to be ported to the new interface
- finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)

+1 on supporting this - please file a Jira - should be very easy to get this in.

-----Original Message-----
From: Johan Oskarsson [mailto:johan@oskarsson.nu] 
Sent: Monday, December 08, 2008 10:16 AM
To: hive-user@hadoop.apache.org
Subject: Serde and Record I/O

We store a lot of data in SequenceFiles with the key and value as
generated Jute/RecordIO files and would want to process it all using Hive.

I noticed that there is a serde/jute package, but I assume serde version
1 is deprecated in favour of serde2? Either way I get a class cast
exception if I try to use it.

I've looked through the mailinglist and wiki but can't find a good
example on how to process sequencefiles with recordio key/value classes.
Any help would be much appreciated.

/Johan