You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Avrilia Floratou <fl...@cs.wisc.edu> on 2010/11/13 23:30:31 UTC

Convert data to BytesRefArrayWritable

Hi,

I want to convert data stored in a hadoop sequence file to 
BytesRefArrayWritable so that I can use RCFileOutputFormat and create an 
RCFile.

My data contains integers,strings and hashmaps. I guess I don't have to 
write my own serializer/deserializer for these. I tried using the 
ColumnarSerDe serializer. It serializes data that are stored in a 
struct. Should I use the ColumnarStruct to store the data? If yes, then 
how can I store each row from my dataset to this data structure? I don't 
see any methods to set the fields.

Should I use a different way to get the BytesRefArrayWritable that I need?

Thank you,
Avrilia

Re: Convert data to BytesRefArrayWritable

Posted by yongqiang he <he...@gmail.com>.
If your old data are stored in Hive, you can just use "insert
overwrite" to convert the data.

If you want your data be directly generated to a RCFile, rcfile has
its own Writer API. You can find some examples from TestRCFile
testcase from Hive..

On Sat, Nov 13, 2010 at 2:30 PM, Avrilia Floratou <fl...@cs.wisc.edu> wrote:
> Hi,
>
> I want to convert data stored in a hadoop sequence file to
> BytesRefArrayWritable so that I can use RCFileOutputFormat and create an
> RCFile.
>
> My data contains integers,strings and hashmaps. I guess I don't have to
> write my own serializer/deserializer for these. I tried using the
> ColumnarSerDe serializer. It serializes data that are stored in a struct.
> Should I use the ColumnarStruct to store the data? If yes, then how can I
> store each row from my dataset to this data structure? I don't see any
> methods to set the fields.
>
> Should I use a different way to get the BytesRefArrayWritable that I need?
>
> Thank you,
> Avrilia
>