You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/08/17 22:54:33 UTC

What is the best way to make a custom serialization JAR visible to Hadoop?

Please disregard my earlier email. I accidentally sent it before I was done
writing.

I am working with some data that has a custom IO serialization. So I've
added a MySerialization class to the io.serializations property of
mapred-site.xml.

<property>
  <name>io.serializations</name>
<value>MySerialization,org.apache.hadoop.io.serializer.WritableSerialization</value>
</property>

If I write a Hadoop job that uses this data type, I just make sure to
include MySerialization in it. However, say I want to make standard Hadoop
jobs like Fs or Streaming also understand this serialization type. I have to
put it in a JAR that the Hadoop framework can see. What is the best way to
do this? I've been adding it to the HADOOP_CLASSPATH in hadoop-env.sh.

Re: What is the best way to make a custom serialization JAR visible to Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

A location on HADOOP_CLASSPATH available across all nodes would be the
best thing, if you aren't going with adding it to distributed caches
every time you need such a job.

Note that you'll have to restart your TTs to get their classpaths updated.

On Thu, Aug 18, 2011 at 2:24 AM, W.P. McNeill <bi...@gmail.com> wrote:
> Please disregard my earlier email. I accidentally sent it before I was done
> writing.
>
> I am working with some data that has a custom IO serialization. So I've
> added a MySerialization class to the io.serializations property of
> mapred-site.xml.
>
> <property>
>  <name>io.serializations</name>
> <value>MySerialization,org.apache.hadoop.io.serializer.WritableSerialization</value>
> </property>
>
> If I write a Hadoop job that uses this data type, I just make sure to
> include MySerialization in it. However, say I want to make standard Hadoop
> jobs like Fs or Streaming also understand this serialization type. I have to
> put it in a JAR that the Hadoop framework can see. What is the best way to
> do this? I've been adding it to the HADOOP_CLASSPATH in hadoop-env.sh.
>



-- 
Harsh J