You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2012/08/22 07:39:36 UTC

Hadoop's Avro dependencies.

Hi,

I was going through the Apache Hadoop's distribution dependencies (jars in
lib folder) and I could not find avro-1.x.x.jar.

I though hadoop internally uses avro as its serialization mechanism for
intermediate data transmission (transporting maps output to reducers etc ),
so hadoop distribution must have avro within it. But it doesn't !

Can someone enlighten me on this?

Thanks,
Rahul

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot Harsh! I should get the code instead of bothering people here
in the group.

Rgds,
Rahul

On Wed, Aug 22, 2012 at 11:46 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot Harsh! I should get the code instead of bothering people here
in the group.

Rgds,
Rahul

On Wed, Aug 22, 2012 at 11:46 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot Harsh! I should get the code instead of bothering people here
in the group.

Rgds,
Rahul

On Wed, Aug 22, 2012 at 11:46 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Bertrand,

For the inter-node part, we've been using Writables as well, but in
2.x onwards its switched to ProtocolBuffers. And you're right, this
shouldn't interfere user tasks if they chose to send in a different
protobuf version along.

On Wed, Aug 22, 2012 at 11:57 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.
>
> There is the one used by MapReduce for transmitting user data : Writable by
> default, indeed.
>
> But there is also what is used for inter-node communication. I can't
> remember this one though. But it implies that you could have another
> dependency having no direct impact on the user.
>
> Regards
>
> Bertrand
>
>
> On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> By default, only the Writable serialization technique is used. If you
>> choose to use Avro in your job, only then Avro serialization is
>> utilized at the intermediate serialization step.
>>
>> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Well , thanks a lot Harsh. I though avro was result of hadoop's
>> > serialization needs.
>> >
>> > If avro isn't used for serializing maps outputs and transfer it to other
>> > reducers then whats used for this , if not avro.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> >> does provide an AvroSerialization class you can use optionally to
>> >> serialize using Avro libraries, and the 2.x distribution does ship an
>> >> Avro jar along with it.
>> >>
>> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> >> <ra...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I was going through the Apache Hadoop's distribution dependencies
>> >> > (jars
>> >> > in
>> >> > lib folder) and I could not find avro-1.x.x.jar.
>> >> >
>> >> > I though hadoop internally uses avro as its serialization mechanism
>> >> > for
>> >> > intermediate data transmission (transporting maps output to reducers
>> >> > etc
>> >> > ),
>> >> > so hadoop distribution must have avro within it. But it doesn't !
>> >> >
>> >> > Can someone enlighten me on this?
>> >> >
>> >> > Thanks,
>> >> > Rahul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Bertrand,

For the inter-node part, we've been using Writables as well, but in
2.x onwards its switched to ProtocolBuffers. And you're right, this
shouldn't interfere user tasks if they chose to send in a different
protobuf version along.

On Wed, Aug 22, 2012 at 11:57 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.
>
> There is the one used by MapReduce for transmitting user data : Writable by
> default, indeed.
>
> But there is also what is used for inter-node communication. I can't
> remember this one though. But it implies that you could have another
> dependency having no direct impact on the user.
>
> Regards
>
> Bertrand
>
>
> On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> By default, only the Writable serialization technique is used. If you
>> choose to use Avro in your job, only then Avro serialization is
>> utilized at the intermediate serialization step.
>>
>> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Well , thanks a lot Harsh. I though avro was result of hadoop's
>> > serialization needs.
>> >
>> > If avro isn't used for serializing maps outputs and transfer it to other
>> > reducers then whats used for this , if not avro.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> >> does provide an AvroSerialization class you can use optionally to
>> >> serialize using Avro libraries, and the 2.x distribution does ship an
>> >> Avro jar along with it.
>> >>
>> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> >> <ra...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I was going through the Apache Hadoop's distribution dependencies
>> >> > (jars
>> >> > in
>> >> > lib folder) and I could not find avro-1.x.x.jar.
>> >> >
>> >> > I though hadoop internally uses avro as its serialization mechanism
>> >> > for
>> >> > intermediate data transmission (transporting maps output to reducers
>> >> > etc
>> >> > ),
>> >> > so hadoop distribution must have avro within it. But it doesn't !
>> >> >
>> >> > Can someone enlighten me on this?
>> >> >
>> >> > Thanks,
>> >> > Rahul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Bertrand,

For the inter-node part, we've been using Writables as well, but in
2.x onwards its switched to ProtocolBuffers. And you're right, this
shouldn't interfere user tasks if they chose to send in a different
protobuf version along.

On Wed, Aug 22, 2012 at 11:57 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.
>
> There is the one used by MapReduce for transmitting user data : Writable by
> default, indeed.
>
> But there is also what is used for inter-node communication. I can't
> remember this one though. But it implies that you could have another
> dependency having no direct impact on the user.
>
> Regards
>
> Bertrand
>
>
> On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> By default, only the Writable serialization technique is used. If you
>> choose to use Avro in your job, only then Avro serialization is
>> utilized at the intermediate serialization step.
>>
>> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Well , thanks a lot Harsh. I though avro was result of hadoop's
>> > serialization needs.
>> >
>> > If avro isn't used for serializing maps outputs and transfer it to other
>> > reducers then whats used for this , if not avro.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> >> does provide an AvroSerialization class you can use optionally to
>> >> serialize using Avro libraries, and the 2.x distribution does ship an
>> >> Avro jar along with it.
>> >>
>> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> >> <ra...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I was going through the Apache Hadoop's distribution dependencies
>> >> > (jars
>> >> > in
>> >> > lib folder) and I could not find avro-1.x.x.jar.
>> >> >
>> >> > I though hadoop internally uses avro as its serialization mechanism
>> >> > for
>> >> > intermediate data transmission (transporting maps output to reducers
>> >> > etc
>> >> > ),
>> >> > so hadoop distribution must have avro within it. But it doesn't !
>> >> >
>> >> > Can someone enlighten me on this?
>> >> >
>> >> > Thanks,
>> >> > Rahul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Bertrand,

For the inter-node part, we've been using Writables as well, but in
2.x onwards its switched to ProtocolBuffers. And you're right, this
shouldn't interfere user tasks if they chose to send in a different
protobuf version along.

On Wed, Aug 22, 2012 at 11:57 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.
>
> There is the one used by MapReduce for transmitting user data : Writable by
> default, indeed.
>
> But there is also what is used for inter-node communication. I can't
> remember this one though. But it implies that you could have another
> dependency having no direct impact on the user.
>
> Regards
>
> Bertrand
>
>
> On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> By default, only the Writable serialization technique is used. If you
>> choose to use Avro in your job, only then Avro serialization is
>> utilized at the intermediate serialization step.
>>
>> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Well , thanks a lot Harsh. I though avro was result of hadoop's
>> > serialization needs.
>> >
>> > If avro isn't used for serializing maps outputs and transfer it to other
>> > reducers then whats used for this , if not avro.
>> >
>> > Thanks,
>> > Rahul
>> >
>> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> >> does provide an AvroSerialization class you can use optionally to
>> >> serialize using Avro libraries, and the 2.x distribution does ship an
>> >> Avro jar along with it.
>> >>
>> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> >> <ra...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I was going through the Apache Hadoop's distribution dependencies
>> >> > (jars
>> >> > in
>> >> > lib folder) and I could not find avro-1.x.x.jar.
>> >> >
>> >> > I though hadoop internally uses avro as its serialization mechanism
>> >> > for
>> >> > intermediate data transmission (transporting maps output to reducers
>> >> > etc
>> >> > ),
>> >> > so hadoop distribution must have avro within it. But it doesn't !
>> >> >
>> >> > Can someone enlighten me on this?
>> >> >
>> >> > Thanks,
>> >> > Rahul
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Bertrand Dechoux <de...@gmail.com>.
Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.

There is the one used by MapReduce for transmitting user data : Writable by
default, indeed.

But there is also what is used for inter-node communication. I can't
remember this one though. But it implies that you could have another
dependency having no direct impact on the user.

Regards

Bertrand

On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: Hadoop's Avro dependencies.

Posted by Bertrand Dechoux <de...@gmail.com>.
Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.

There is the one used by MapReduce for transmitting user data : Writable by
default, indeed.

But there is also what is used for inter-node communication. I can't
remember this one though. But it implies that you could have another
dependency having no direct impact on the user.

Regards

Bertrand

On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: Hadoop's Avro dependencies.

Posted by Bertrand Dechoux <de...@gmail.com>.
Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.

There is the one used by MapReduce for transmitting user data : Writable by
default, indeed.

But there is also what is used for inter-node communication. I can't
remember this one though. But it implies that you could have another
dependency having no direct impact on the user.

Regards

Bertrand

On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: Hadoop's Avro dependencies.

Posted by Bertrand Dechoux <de...@gmail.com>.
Also, if I am not mistaken, there is 2 kind of serializations in Hadoop.

There is the one used by MapReduce for transmitting user data : Writable by
default, indeed.

But there is also what is used for inter-node communication. I can't
remember this one though. But it implies that you could have another
dependency having no direct impact on the user.

Regards

Bertrand

On Wed, Aug 22, 2012 at 8:16 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>



-- 
Bertrand Dechoux

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot Harsh! I should get the code instead of bothering people here
in the group.

Rgds,
Rahul

On Wed, Aug 22, 2012 at 11:46 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> By default, only the Writable serialization technique is used. If you
> choose to use Avro in your job, only then Avro serialization is
> utilized at the intermediate serialization step.
>
> On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Well , thanks a lot Harsh. I though avro was result of hadoop's
> > serialization needs.
> >
> > If avro isn't used for serializing maps outputs and transfer it to other
> > reducers then whats used for this , if not avro.
> >
> > Thanks,
> > Rahul
> >
> > On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi,
> >>
> >> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> >> does provide an AvroSerialization class you can use optionally to
> >> serialize using Avro libraries, and the 2.x distribution does ship an
> >> Avro jar along with it.
> >>
> >> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> >> <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I was going through the Apache Hadoop's distribution dependencies
> (jars
> >> > in
> >> > lib folder) and I could not find avro-1.x.x.jar.
> >> >
> >> > I though hadoop internally uses avro as its serialization mechanism
> for
> >> > intermediate data transmission (transporting maps output to reducers
> etc
> >> > ),
> >> > so hadoop distribution must have avro within it. But it doesn't !
> >> >
> >> > Can someone enlighten me on this?
> >> >
> >> > Thanks,
> >> > Rahul
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

By default, only the Writable serialization technique is used. If you
choose to use Avro in your job, only then Avro serialization is
utilized at the intermediate serialization step.

On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Well , thanks a lot Harsh. I though avro was result of hadoop's
> serialization needs.
>
> If avro isn't used for serializing maps outputs and transfer it to other
> reducers then whats used for this , if not avro.
>
> Thanks,
> Rahul
>
> On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> does provide an AvroSerialization class you can use optionally to
>> serialize using Avro libraries, and the 2.x distribution does ship an
>> Avro jar along with it.
>>
>> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > I was going through the Apache Hadoop's distribution dependencies (jars
>> > in
>> > lib folder) and I could not find avro-1.x.x.jar.
>> >
>> > I though hadoop internally uses avro as its serialization mechanism for
>> > intermediate data transmission (transporting maps output to reducers etc
>> > ),
>> > so hadoop distribution must have avro within it. But it doesn't !
>> >
>> > Can someone enlighten me on this?
>> >
>> > Thanks,
>> > Rahul
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

By default, only the Writable serialization technique is used. If you
choose to use Avro in your job, only then Avro serialization is
utilized at the intermediate serialization step.

On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Well , thanks a lot Harsh. I though avro was result of hadoop's
> serialization needs.
>
> If avro isn't used for serializing maps outputs and transfer it to other
> reducers then whats used for this , if not avro.
>
> Thanks,
> Rahul
>
> On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> does provide an AvroSerialization class you can use optionally to
>> serialize using Avro libraries, and the 2.x distribution does ship an
>> Avro jar along with it.
>>
>> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > I was going through the Apache Hadoop's distribution dependencies (jars
>> > in
>> > lib folder) and I could not find avro-1.x.x.jar.
>> >
>> > I though hadoop internally uses avro as its serialization mechanism for
>> > intermediate data transmission (transporting maps output to reducers etc
>> > ),
>> > so hadoop distribution must have avro within it. But it doesn't !
>> >
>> > Can someone enlighten me on this?
>> >
>> > Thanks,
>> > Rahul
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

By default, only the Writable serialization technique is used. If you
choose to use Avro in your job, only then Avro serialization is
utilized at the intermediate serialization step.

On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Well , thanks a lot Harsh. I though avro was result of hadoop's
> serialization needs.
>
> If avro isn't used for serializing maps outputs and transfer it to other
> reducers then whats used for this , if not avro.
>
> Thanks,
> Rahul
>
> On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> does provide an AvroSerialization class you can use optionally to
>> serialize using Avro libraries, and the 2.x distribution does ship an
>> Avro jar along with it.
>>
>> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > I was going through the Apache Hadoop's distribution dependencies (jars
>> > in
>> > lib folder) and I could not find avro-1.x.x.jar.
>> >
>> > I though hadoop internally uses avro as its serialization mechanism for
>> > intermediate data transmission (transporting maps output to reducers etc
>> > ),
>> > so hadoop distribution must have avro within it. But it doesn't !
>> >
>> > Can someone enlighten me on this?
>> >
>> > Thanks,
>> > Rahul
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

By default, only the Writable serialization technique is used. If you
choose to use Avro in your job, only then Avro serialization is
utilized at the intermediate serialization step.

On Wed, Aug 22, 2012 at 11:42 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Well , thanks a lot Harsh. I though avro was result of hadoop's
> serialization needs.
>
> If avro isn't used for serializing maps outputs and transfer it to other
> reducers then whats used for this , if not avro.
>
> Thanks,
> Rahul
>
> On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi,
>>
>> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
>> does provide an AvroSerialization class you can use optionally to
>> serialize using Avro libraries, and the 2.x distribution does ship an
>> Avro jar along with it.
>>
>> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
>> <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > I was going through the Apache Hadoop's distribution dependencies (jars
>> > in
>> > lib folder) and I could not find avro-1.x.x.jar.
>> >
>> > I though hadoop internally uses avro as its serialization mechanism for
>> > intermediate data transmission (transporting maps output to reducers etc
>> > ),
>> > so hadoop distribution must have avro within it. But it doesn't !
>> >
>> > Can someone enlighten me on this?
>> >
>> > Thanks,
>> > Rahul
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Well , thanks a lot Harsh. I though avro was result of hadoop's
serialization needs.

If avro isn't used for serializing maps outputs and transfer it to other
reducers then whats used for this , if not avro.

Thanks,
Rahul

On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> does provide an AvroSerialization class you can use optionally to
> serialize using Avro libraries, and the 2.x distribution does ship an
> Avro jar along with it.
>
> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi,
> >
> > I was going through the Apache Hadoop's distribution dependencies (jars
> in
> > lib folder) and I could not find avro-1.x.x.jar.
> >
> > I though hadoop internally uses avro as its serialization mechanism for
> > intermediate data transmission (transporting maps output to reducers etc
> ),
> > so hadoop distribution must have avro within it. But it doesn't !
> >
> > Can someone enlighten me on this?
> >
> > Thanks,
> > Rahul
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Well , thanks a lot Harsh. I though avro was result of hadoop's
serialization needs.

If avro isn't used for serializing maps outputs and transfer it to other
reducers then whats used for this , if not avro.

Thanks,
Rahul

On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> does provide an AvroSerialization class you can use optionally to
> serialize using Avro libraries, and the 2.x distribution does ship an
> Avro jar along with it.
>
> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi,
> >
> > I was going through the Apache Hadoop's distribution dependencies (jars
> in
> > lib folder) and I could not find avro-1.x.x.jar.
> >
> > I though hadoop internally uses avro as its serialization mechanism for
> > intermediate data transmission (transporting maps output to reducers etc
> ),
> > so hadoop distribution must have avro within it. But it doesn't !
> >
> > Can someone enlighten me on this?
> >
> > Thanks,
> > Rahul
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Well , thanks a lot Harsh. I though avro was result of hadoop's
serialization needs.

If avro isn't used for serializing maps outputs and transfer it to other
reducers then whats used for this , if not avro.

Thanks,
Rahul

On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> does provide an AvroSerialization class you can use optionally to
> serialize using Avro libraries, and the 2.x distribution does ship an
> Avro jar along with it.
>
> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi,
> >
> > I was going through the Apache Hadoop's distribution dependencies (jars
> in
> > lib folder) and I could not find avro-1.x.x.jar.
> >
> > I though hadoop internally uses avro as its serialization mechanism for
> > intermediate data transmission (transporting maps output to reducers etc
> ),
> > so hadoop distribution must have avro within it. But it doesn't !
> >
> > Can someone enlighten me on this?
> >
> > Thanks,
> > Rahul
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Well , thanks a lot Harsh. I though avro was result of hadoop's
serialization needs.

If avro isn't used for serializing maps outputs and transfer it to other
reducers then whats used for this , if not avro.

Thanks,
Rahul

On Wed, Aug 22, 2012 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
> does provide an AvroSerialization class you can use optionally to
> serialize using Avro libraries, and the 2.x distribution does ship an
> Avro jar along with it.
>
> On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
> <ra...@gmail.com> wrote:
> > Hi,
> >
> > I was going through the Apache Hadoop's distribution dependencies (jars
> in
> > lib folder) and I could not find avro-1.x.x.jar.
> >
> > I though hadoop internally uses avro as its serialization mechanism for
> > intermediate data transmission (transporting maps output to reducers etc
> ),
> > so hadoop distribution must have avro within it. But it doesn't !
> >
> > Can someone enlighten me on this?
> >
> > Thanks,
> > Rahul
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
does provide an AvroSerialization class you can use optionally to
serialize using Avro libraries, and the 2.x distribution does ship an
Avro jar along with it.

On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the Apache Hadoop's distribution dependencies (jars in
> lib folder) and I could not find avro-1.x.x.jar.
>
> I though hadoop internally uses avro as its serialization mechanism for
> intermediate data transmission (transporting maps output to reducers etc ),
> so hadoop distribution must have avro within it. But it doesn't !
>
> Can someone enlighten me on this?
>
> Thanks,
> Rahul
>



-- 
Harsh J

unsubscribe

Posted by Tibor Korocz <tk...@gmail.com>.
unsubscribe

unsubscribe

Posted by Tibor Korocz <tk...@gmail.com>.
unsubscribe

unsubscribe

Posted by Tibor Korocz <tk...@gmail.com>.
unsubscribe

unsubscribe

Posted by Tibor Korocz <tk...@gmail.com>.
unsubscribe

unsubscribe

Posted by sathyavageeswaran <sa...@morisonmenon.com>.
unsubscribe

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 22 August 2012 11:10
To: user@hadoop.apache.org
Subject: Hadoop's Avro dependencies.

 

Hi,

I was going through the Apache Hadoop's distribution dependencies (jars in lib folder) and I could not find avro-1.x.x.jar.

I though hadoop internally uses avro as its serialization mechanism for intermediate data transmission (transporting maps output to reducers etc ), so hadoop distribution must have avro within it. But it doesn't !

Can someone enlighten me on this?

Thanks,
Rahul

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5215 - Release Date: 08/21/12


Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
does provide an AvroSerialization class you can use optionally to
serialize using Avro libraries, and the 2.x distribution does ship an
Avro jar along with it.

On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the Apache Hadoop's distribution dependencies (jars in
> lib folder) and I could not find avro-1.x.x.jar.
>
> I though hadoop internally uses avro as its serialization mechanism for
> intermediate data transmission (transporting maps output to reducers etc ),
> so hadoop distribution must have avro within it. But it doesn't !
>
> Can someone enlighten me on this?
>
> Thanks,
> Rahul
>



-- 
Harsh J

unsubscribe

Posted by sathyavageeswaran <sa...@morisonmenon.com>.
unsubscribe

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 22 August 2012 11:10
To: user@hadoop.apache.org
Subject: Hadoop's Avro dependencies.

 

Hi,

I was going through the Apache Hadoop's distribution dependencies (jars in lib folder) and I could not find avro-1.x.x.jar.

I though hadoop internally uses avro as its serialization mechanism for intermediate data transmission (transporting maps output to reducers etc ), so hadoop distribution must have avro within it. But it doesn't !

Can someone enlighten me on this?

Thanks,
Rahul

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5215 - Release Date: 08/21/12


unsubscribe

Posted by sathyavageeswaran <sa...@morisonmenon.com>.
unsubscribe

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 22 August 2012 11:10
To: user@hadoop.apache.org
Subject: Hadoop's Avro dependencies.

 

Hi,

I was going through the Apache Hadoop's distribution dependencies (jars in lib folder) and I could not find avro-1.x.x.jar.

I though hadoop internally uses avro as its serialization mechanism for intermediate data transmission (transporting maps output to reducers etc ), so hadoop distribution must have avro within it. But it doesn't !

Can someone enlighten me on this?

Thanks,
Rahul

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5215 - Release Date: 08/21/12


Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
does provide an AvroSerialization class you can use optionally to
serialize using Avro libraries, and the 2.x distribution does ship an
Avro jar along with it.

On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the Apache Hadoop's distribution dependencies (jars in
> lib folder) and I could not find avro-1.x.x.jar.
>
> I though hadoop internally uses avro as its serialization mechanism for
> intermediate data transmission (transporting maps output to reducers etc ),
> so hadoop distribution must have avro within it. But it doesn't !
>
> Can someone enlighten me on this?
>
> Thanks,
> Rahul
>



-- 
Harsh J

Re: Hadoop's Avro dependencies.

Posted by Harsh J <ha...@cloudera.com>.
Hi,

Hadoop doesn't use Avro serialization on its own. However, Hadoop 2.x
does provide an AvroSerialization class you can use optionally to
serialize using Avro libraries, and the 2.x distribution does ship an
Avro jar along with it.

On Wed, Aug 22, 2012 at 11:09 AM, Rahul Bhattacharjee
<ra...@gmail.com> wrote:
> Hi,
>
> I was going through the Apache Hadoop's distribution dependencies (jars in
> lib folder) and I could not find avro-1.x.x.jar.
>
> I though hadoop internally uses avro as its serialization mechanism for
> intermediate data transmission (transporting maps output to reducers etc ),
> so hadoop distribution must have avro within it. But it doesn't !
>
> Can someone enlighten me on this?
>
> Thanks,
> Rahul
>



-- 
Harsh J

unsubscribe

Posted by sathyavageeswaran <sa...@morisonmenon.com>.
unsubscribe

 

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com] 
Sent: 22 August 2012 11:10
To: user@hadoop.apache.org
Subject: Hadoop's Avro dependencies.

 

Hi,

I was going through the Apache Hadoop's distribution dependencies (jars in lib folder) and I could not find avro-1.x.x.jar.

I though hadoop internally uses avro as its serialization mechanism for intermediate data transmission (transporting maps output to reducers etc ), so hadoop distribution must have avro within it. But it doesn't !

Can someone enlighten me on this?

Thanks,
Rahul

  _____  

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5215 - Release Date: 08/21/12