You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Johan Oskarsson <jo...@oskarsson.nu> on 2008/12/08 19:16:16 UTC
Serde and Record I/O
We store a lot of data in SequenceFiles with the key and value as
generated Jute/RecordIO files and would want to process it all using Hive.
I noticed that there is a serde/jute package, but I assume serde version
1 is deprecated in favour of serde2? Either way I get a class cast
exception if I try to use it.
I've looked through the mailinglist and wiki but can't find a good
example on how to process sequencefiles with recordio key/value classes.
Any help would be much appreciated.
/Johan
RE: Serde and Record I/O
Posted by Joydeep Sen Sarma <js...@facebook.com>.
Sorry - I think Ashish is working to get the patches in. I will take a look at 126 - we are missing Prasad who wrote most of the current metastore code.
-----Original Message-----
From: Johan Oskarsson [mailto:johan@oskarsson.nu]
Sent: Monday, December 08, 2008 10:53 AM
To: hive-user@hadoop.apache.org
Subject: Re: Serde and Record I/O
Thanks for the quick reply.
I have opened a ticket as suggested:
https://issues.apache.org/jira/browse/HIVE-133
While I have you attention I would be very grateful if someone could
take a few minutes and commit the following patches, they're pretty
small and have already been reviewed:
https://issues.apache.org/jira/browse/HIVE-90
https://issues.apache.org/jira/browse/HIVE-114
https://issues.apache.org/jira/browse/HIVE-116
https://issues.apache.org/jira/browse/HIVE-101 (not a patch)
And I need some advice on this one:
https://issues.apache.org/jira/browse/HIVE-126
Thanks in advance!
/Johan
Joydeep Sen Sarma wrote:
> Hi Johan - so keys and value class types are RecordIO classes?
>
> This may need some dev work. A few things:
> - traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
> - the jute code was written for an older version of the serde interface and needs to be ported to the new interface
> - finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)
>
> +1 on supporting this - please file a Jira - should be very easy to get this in.
>
> -----Original Message-----
> From: Johan Oskarsson [mailto:johan@oskarsson.nu]
> Sent: Monday, December 08, 2008 10:16 AM
> To: hive-user@hadoop.apache.org
> Subject: Serde and Record I/O
>
> We store a lot of data in SequenceFiles with the key and value as
> generated Jute/RecordIO files and would want to process it all using Hive.
>
> I noticed that there is a serde/jute package, but I assume serde version
> 1 is deprecated in favour of serde2? Either way I get a class cast
> exception if I try to use it.
>
> I've looked through the mailinglist and wiki but can't find a good
> example on how to process sequencefiles with recordio key/value classes.
> Any help would be much appreciated.
>
> /Johan
Re: Serde and Record I/O
Posted by Johan Oskarsson <jo...@oskarsson.nu>.
Thanks for the quick reply.
I have opened a ticket as suggested:
https://issues.apache.org/jira/browse/HIVE-133
While I have you attention I would be very grateful if someone could
take a few minutes and commit the following patches, they're pretty
small and have already been reviewed:
https://issues.apache.org/jira/browse/HIVE-90
https://issues.apache.org/jira/browse/HIVE-114
https://issues.apache.org/jira/browse/HIVE-116
https://issues.apache.org/jira/browse/HIVE-101 (not a patch)
And I need some advice on this one:
https://issues.apache.org/jira/browse/HIVE-126
Thanks in advance!
/Johan
Joydeep Sen Sarma wrote:
> Hi Johan - so keys and value class types are RecordIO classes?
>
> This may need some dev work. A few things:
> - traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
> - the jute code was written for an older version of the serde interface and needs to be ported to the new interface
> - finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)
>
> +1 on supporting this - please file a Jira - should be very easy to get this in.
>
> -----Original Message-----
> From: Johan Oskarsson [mailto:johan@oskarsson.nu]
> Sent: Monday, December 08, 2008 10:16 AM
> To: hive-user@hadoop.apache.org
> Subject: Serde and Record I/O
>
> We store a lot of data in SequenceFiles with the key and value as
> generated Jute/RecordIO files and would want to process it all using Hive.
>
> I noticed that there is a serde/jute package, but I assume serde version
> 1 is deprecated in favour of serde2? Either way I get a class cast
> exception if I try to use it.
>
> I've looked through the mailinglist and wiki but can't find a good
> example on how to process sequencefiles with recordio key/value classes.
> Any help would be much appreciated.
>
> /Johan
RE: Serde and Record I/O
Posted by Joydeep Sen Sarma <js...@facebook.com>.
Hi Johan - so keys and value class types are RecordIO classes?
This may need some dev work. A few things:
- traditionally our serde's have ignored the keys altogether (the row is embedded in the value). What are the semantics for ur case?
- the jute code was written for an older version of the serde interface and needs to be ported to the new interface
- finally - i am not sure about the current jute code (I am looking at it and the deserialization code is not making sense to me)
+1 on supporting this - please file a Jira - should be very easy to get this in.
-----Original Message-----
From: Johan Oskarsson [mailto:johan@oskarsson.nu]
Sent: Monday, December 08, 2008 10:16 AM
To: hive-user@hadoop.apache.org
Subject: Serde and Record I/O
We store a lot of data in SequenceFiles with the key and value as
generated Jute/RecordIO files and would want to process it all using Hive.
I noticed that there is a serde/jute package, but I assume serde version
1 is deprecated in favour of serde2? Either way I get a class cast
exception if I try to use it.
I've looked through the mailinglist and wiki but can't find a good
example on how to process sequencefiles with recordio key/value classes.
Any help would be much appreciated.
/Johan