You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Rubin, Bradley S." <BS...@stthomas.edu> on 2012/05/21 21:41:47 UTC

How to Write a Simple SerDe?

My Hadoop MR job emits a value with three primitives via a custom Writable (see below).  How do I write a corresponding custom SerDe so that Hive can read the output from HDFS?  I can find complex SerDe examples (RegEx, JSON), but I can't find something simple to model from.

I think that my create table should look like this, correct?

CREATE EXTERNAL TABLE rats (time INT, frequency SMALLINT, convolution FLOAT)
ROW FORMAT SERDE 'neurohadoop.RatSerde'
STORED AS SEQUENCEFILE LOCATION '/neuro/output/rats';

----------

public class RatWritable implements Writable {
  int timestamp;
  short frequency;
  float convolution;
  
  public void readFields(DataInput in) throws IOException {
	  timestamp = in.readInt();
	  frequency = in.readShort();
	  convolution = in.readFloat();
  }
  
  public void write(DataOutput out) throws IOException {
	  out.writeInt(timestamp);
	  out.writeShort(frequency);
	  out.writeFloat(convolution);
  }
}

-- Brad

Re: How to Write a Simple SerDe?

Posted by "Rubin, Bradley S." <BS...@stthomas.edu>.
I ended up slogging through this, and got it working.  Here is the code for the custom writable and the corresponding custom SerDe, in case it helps others trying to do the same thing: http://pastebin.com/xUy36Kxg .

It dropped the average bytes/record from 30.5 (with a CSV text string) to 18.2.  Snappy block compression was enabled in both cases.  So, this was worth the effort.

It would be great if Hive could just read the Writable and treat it as the SerDe, at least for these simple cases, or at least have a tool that given the Writable, generates the corresponding SerDe.

-- Brad

On May 21, 2012, at 2:41 PM, Rubin, Bradley S. wrote:

My Hadoop MR job emits a value with three primitives via a custom Writable (see below).  How do I write a corresponding custom SerDe so that Hive can read the output from HDFS?  I can find complex SerDe examples (RegEx, JSON), but I can't find something simple to model from.

I think that my create table should look like this, correct?

CREATE EXTERNAL TABLE rats (time INT, frequency SMALLINT, convolution FLOAT)
ROW FORMAT SERDE 'neurohadoop.RatSerde'
STORED AS SEQUENCEFILE LOCATION '/neuro/output/rats';

----------

public class RatWritable implements Writable {
 int timestamp;
 short frequency;
 float convolution;

 public void readFields(DataInput in) throws IOException {
 timestamp = in.readInt();
 frequency = in.readShort();
 convolution = in.readFloat();
 }

 public void write(DataOutput out) throws IOException {
 out.writeInt(timestamp);
 out.writeShort(frequency);
 out.writeFloat(convolution);
 }
}

-- Brad