You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2013/04/30 15:21:35 UTC

Hadoop Avro Question

Hi,

When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
that the output of K and V of AvroMapper isnt writable and neither the key
is comparable (these are AvroKey and AvroValue). As the general
serialization mechanism is writable , how is the K,V pairs in case of avro
, travel across nodes?

Thanks,
Rahul

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Oops, moving for sure this time :)

On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization defaults in Hadoop. MR accepts a list of
> serializers it can use for its key/value structures and isn't limited
> to Writable in any way. Look up the property "io.serializations" in
> your Hadoop's core-default.xml for more information.
>
> The Avro project also offers fast comparator classes that are used for
> comparing the bytes/structures of Avro objects. This is mostly
> auto-set for you when you use the MR framework as described at
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> (via AvroJob helper class).
>
> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
>> Hi,
>>
>> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
>> that the output of K and V of AvroMapper isnt writable and neither the key
>> is comparable (these are AvroKey and AvroValue). As the general
>> serialization mechanism is writable , how is the K,V pairs in case of avro ,
>> travel across nodes?
>>
>> Thanks,
>> Rahul
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Oops, moving for sure this time :)

On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization defaults in Hadoop. MR accepts a list of
> serializers it can use for its key/value structures and isn't limited
> to Writable in any way. Look up the property "io.serializations" in
> your Hadoop's core-default.xml for more information.
>
> The Avro project also offers fast comparator classes that are used for
> comparing the bytes/structures of Avro objects. This is mostly
> auto-set for you when you use the MR framework as described at
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> (via AvroJob helper class).
>
> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
>> Hi,
>>
>> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
>> that the output of K and V of AvroMapper isnt writable and neither the key
>> is comparable (these are AvroKey and AvroValue). As the general
>> serialization mechanism is writable , how is the K,V pairs in case of avro ,
>> travel across nodes?
>>
>> Thanks,
>> Rahul
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Oops, moving for sure this time :)

On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization defaults in Hadoop. MR accepts a list of
> serializers it can use for its key/value structures and isn't limited
> to Writable in any way. Look up the property "io.serializations" in
> your Hadoop's core-default.xml for more information.
>
> The Avro project also offers fast comparator classes that are used for
> comparing the bytes/structures of Avro objects. This is mostly
> auto-set for you when you use the MR framework as described at
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> (via AvroJob helper class).
>
> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
>> Hi,
>>
>> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
>> that the output of K and V of AvroMapper isnt writable and neither the key
>> is comparable (these are AvroKey and AvroValue). As the general
>> serialization mechanism is writable , how is the K,V pairs in case of avro ,
>> travel across nodes?
>>
>> Thanks,
>> Rahul
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Oops, moving for sure this time :)

On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization defaults in Hadoop. MR accepts a list of
> serializers it can use for its key/value structures and isn't limited
> to Writable in any way. Look up the property "io.serializations" in
> your Hadoop's core-default.xml for more information.
>
> The Avro project also offers fast comparator classes that are used for
> comparing the bytes/structures of Avro objects. This is mostly
> auto-set for you when you use the MR framework as described at
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> (via AvroJob helper class).
>
> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
>> Hi,
>>
>> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
>> that the output of K and V of AvroMapper isnt writable and neither the key
>> is comparable (these are AvroKey and AvroValue). As the general
>> serialization mechanism is writable , how is the K,V pairs in case of avro ,
>> travel across nodes?
>>
>> Thanks,
>> Rahul
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Oops, moving for sure this time :)

On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
> Moving the question to Apache Avro's user@ lists. Please use the right
> lists for the most relevant answers.
>
> Avro is a different serialization technique that intends to replace
> the Writable serialization defaults in Hadoop. MR accepts a list of
> serializers it can use for its key/value structures and isn't limited
> to Writable in any way. Look up the property "io.serializations" in
> your Hadoop's core-default.xml for more information.
>
> The Avro project also offers fast comparator classes that are used for
> comparing the bytes/structures of Avro objects. This is mostly
> auto-set for you when you use the MR framework as described at
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
> (via AvroJob helper class).
>
> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
>> Hi,
>>
>> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
>> that the output of K and V of AvroMapper isnt writable and neither the key
>> is comparable (these are AvroKey and AvroValue). As the general
>> serialization mechanism is writable , how is the K,V pairs in case of avro ,
>> travel across nodes?
>>
>> Thanks,
>> Rahul
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
For the TOP, check out
https://issues.apache.org/jira/browse/MAPREDUCE-4574 which we fixed in
Hadoop recently to allow full reuse with Avro.

On Wed, May 1, 2013 at 10:49 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi Harsh,
> Looks like a lot of other classes are also to be rewritten for avro. Like
> the total sort partitioner , which I think currently assumes writable as the
> io mechanism.
>
> I faced problem using with avro , so though of writing to the forum.
>
> Thanks a lot
> Rahul!
>
>
> On Wed, May 1, 2013 at 10:35 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Moving the question to Apache Avro's user@ lists. Please use the right
>> lists for the most relevant answers.
>>
>> Avro is a different serialization technique that intends to replace
>> the Writable serialization defaults in Hadoop. MR accepts a list of
>> serializers it can use for its key/value structures and isn't limited
>> to Writable in any way. Look up the property "io.serializations" in
>> your Hadoop's core-default.xml for more information.
>>
>> The Avro project also offers fast comparator classes that are used for
>> comparing the bytes/structures of Avro objects. This is mostly
>> auto-set for you when you use the MR framework as described at
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> (via AvroJob helper class).
>>
>> On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > When dealing with Avro data files in MR jobs ,we use AvroMapper , I
>> > noticed
>> > that the output of K and V of AvroMapper isnt writable and neither the
>> > key
>> > is comparable (these are AvroKey and AvroValue). As the general
>> > serialization mechanism is writable , how is the K,V pairs in case of
>> > avro ,
>> > travel across nodes?
>> >
>> > Thanks,
>> > Rahul
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.

Avro is a different serialization technique that intends to replace
the Writable serialization defaults in Hadoop. MR accepts a list of
serializers it can use for its key/value structures and isn't limited
to Writable in any way. Look up the property "io.serializations" in
your Hadoop's core-default.xml for more information.

The Avro project also offers fast comparator classes that are used for
comparing the bytes/structures of Avro objects. This is mostly
auto-set for you when you use the MR framework as described at
http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
(via AvroJob helper class).

On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
> that the output of K and V of AvroMapper isnt writable and neither the key
> is comparable (these are AvroKey and AvroValue). As the general
> serialization mechanism is writable , how is the K,V pairs in case of avro ,
> travel across nodes?
>
> Thanks,
> Rahul



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.

Avro is a different serialization technique that intends to replace
the Writable serialization defaults in Hadoop. MR accepts a list of
serializers it can use for its key/value structures and isn't limited
to Writable in any way. Look up the property "io.serializations" in
your Hadoop's core-default.xml for more information.

The Avro project also offers fast comparator classes that are used for
comparing the bytes/structures of Avro objects. This is mostly
auto-set for you when you use the MR framework as described at
http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
(via AvroJob helper class).

On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
> that the output of K and V of AvroMapper isnt writable and neither the key
> is comparable (these are AvroKey and AvroValue). As the general
> serialization mechanism is writable , how is the K,V pairs in case of avro ,
> travel across nodes?
>
> Thanks,
> Rahul



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.

Avro is a different serialization technique that intends to replace
the Writable serialization defaults in Hadoop. MR accepts a list of
serializers it can use for its key/value structures and isn't limited
to Writable in any way. Look up the property "io.serializations" in
your Hadoop's core-default.xml for more information.

The Avro project also offers fast comparator classes that are used for
comparing the bytes/structures of Avro objects. This is mostly
auto-set for you when you use the MR framework as described at
http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
(via AvroJob helper class).

On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
> that the output of K and V of AvroMapper isnt writable and neither the key
> is comparable (these are AvroKey and AvroValue). As the general
> serialization mechanism is writable , how is the K,V pairs in case of avro ,
> travel across nodes?
>
> Thanks,
> Rahul



-- 
Harsh J

Re: Hadoop Avro Question

Posted by Harsh J <ha...@cloudera.com>.
Moving the question to Apache Avro's user@ lists. Please use the right
lists for the most relevant answers.

Avro is a different serialization technique that intends to replace
the Writable serialization defaults in Hadoop. MR accepts a list of
serializers it can use for its key/value structures and isn't limited
to Writable in any way. Look up the property "io.serializations" in
your Hadoop's core-default.xml for more information.

The Avro project also offers fast comparator classes that are used for
comparing the bytes/structures of Avro objects. This is mostly
auto-set for you when you use the MR framework as described at
http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
(via AvroJob helper class).

On Tue, Apr 30, 2013 at 6:51 PM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> When dealing with Avro data files in MR jobs ,we use AvroMapper , I noticed
> that the output of K and V of AvroMapper isnt writable and neither the key
> is comparable (these are AvroKey and AvroValue). As the general
> serialization mechanism is writable , how is the K,V pairs in case of avro ,
> travel across nodes?
>
> Thanks,
> Rahul



-- 
Harsh J