You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Adeel Qureshi <ad...@gmail.com> on 2013/08/31 21:53:49 UTC

custom writablecomparable with complex fields

I want to write a custom writablecomparable object with two List objects
within it ..

public class CompositeKey implements WritableComparable {

private List<JsonKey> groupBy;
private List<JsonKey> sortBy;
...
}

what I am not sure about is how to write

readFields and write methods for this object. Any help would be appreciated.

Thanks
Adeel

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The easy way is to deserialize each stream into objects, then compare
them, pretty much what most of the defaults do.

Comparing without deserializing the whole stream is much faster and is
the point behind true RawComparators. Read
http://avro.apache.org/docs/current/spec.html#order for example.

On Sun, Sep 1, 2013 at 9:13 PM, Adeel Qureshi <ad...@gmail.com> wrote:
> Okay that makes sense .. so the same order I write is how I can read ..
> taking it a step further, in the compareto method, how can I use the bytes
> provided to do a comparison on let's say on a list object
>
> On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> The idea behind write(…) and readFields(…) is simply that of ordering.
>> You need to write your custom objects (i.e. a representation of them)
>> in order, and read them back in the same order.
>>
>> An example way of serializing a list would be to first serialize the
>> length (so you know how many you'll be needed to read back), and then
>> serialize each item appropriately, using delimiters, or using
>> length-prefixes just like lists.
>>
>> Mainly, you're required to tackle the serialization/deserialization on
>> your own.
>>
>> This is one of the reasons I highly recommend using a library like
>> Apache Avro instead. Its more powerful, faster, and yet simple to use:
>> http://avro.apache.org/docs/current/gettingstartedjava.html and
>> http://avro.apache.org/docs/current/mr.html. It is also popular and
>> carries first-grade support on several other hadoop-ecosystem
>> projects, such as Flume and Crunch.
>>
>> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
>> wrote:
>> > I want to write a custom writablecomparable object with two List objects
>> > within it ..
>> >
>> > public class CompositeKey implements WritableComparable {
>> >
>> > private List<JsonKey> groupBy;
>> > private List<JsonKey> sortBy;
>> > ...
>> > }
>> >
>> > what I am not sure about is how to write
>> >
>> > readFields and write methods for this object. Any help would be
>> > appreciated.
>> >
>> > Thanks
>> > Adeel
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The easy way is to deserialize each stream into objects, then compare
them, pretty much what most of the defaults do.

Comparing without deserializing the whole stream is much faster and is
the point behind true RawComparators. Read
http://avro.apache.org/docs/current/spec.html#order for example.

On Sun, Sep 1, 2013 at 9:13 PM, Adeel Qureshi <ad...@gmail.com> wrote:
> Okay that makes sense .. so the same order I write is how I can read ..
> taking it a step further, in the compareto method, how can I use the bytes
> provided to do a comparison on let's say on a list object
>
> On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> The idea behind write(…) and readFields(…) is simply that of ordering.
>> You need to write your custom objects (i.e. a representation of them)
>> in order, and read them back in the same order.
>>
>> An example way of serializing a list would be to first serialize the
>> length (so you know how many you'll be needed to read back), and then
>> serialize each item appropriately, using delimiters, or using
>> length-prefixes just like lists.
>>
>> Mainly, you're required to tackle the serialization/deserialization on
>> your own.
>>
>> This is one of the reasons I highly recommend using a library like
>> Apache Avro instead. Its more powerful, faster, and yet simple to use:
>> http://avro.apache.org/docs/current/gettingstartedjava.html and
>> http://avro.apache.org/docs/current/mr.html. It is also popular and
>> carries first-grade support on several other hadoop-ecosystem
>> projects, such as Flume and Crunch.
>>
>> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
>> wrote:
>> > I want to write a custom writablecomparable object with two List objects
>> > within it ..
>> >
>> > public class CompositeKey implements WritableComparable {
>> >
>> > private List<JsonKey> groupBy;
>> > private List<JsonKey> sortBy;
>> > ...
>> > }
>> >
>> > what I am not sure about is how to write
>> >
>> > readFields and write methods for this object. Any help would be
>> > appreciated.
>> >
>> > Thanks
>> > Adeel
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The easy way is to deserialize each stream into objects, then compare
them, pretty much what most of the defaults do.

Comparing without deserializing the whole stream is much faster and is
the point behind true RawComparators. Read
http://avro.apache.org/docs/current/spec.html#order for example.

On Sun, Sep 1, 2013 at 9:13 PM, Adeel Qureshi <ad...@gmail.com> wrote:
> Okay that makes sense .. so the same order I write is how I can read ..
> taking it a step further, in the compareto method, how can I use the bytes
> provided to do a comparison on let's say on a list object
>
> On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> The idea behind write(…) and readFields(…) is simply that of ordering.
>> You need to write your custom objects (i.e. a representation of them)
>> in order, and read them back in the same order.
>>
>> An example way of serializing a list would be to first serialize the
>> length (so you know how many you'll be needed to read back), and then
>> serialize each item appropriately, using delimiters, or using
>> length-prefixes just like lists.
>>
>> Mainly, you're required to tackle the serialization/deserialization on
>> your own.
>>
>> This is one of the reasons I highly recommend using a library like
>> Apache Avro instead. Its more powerful, faster, and yet simple to use:
>> http://avro.apache.org/docs/current/gettingstartedjava.html and
>> http://avro.apache.org/docs/current/mr.html. It is also popular and
>> carries first-grade support on several other hadoop-ecosystem
>> projects, such as Flume and Crunch.
>>
>> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
>> wrote:
>> > I want to write a custom writablecomparable object with two List objects
>> > within it ..
>> >
>> > public class CompositeKey implements WritableComparable {
>> >
>> > private List<JsonKey> groupBy;
>> > private List<JsonKey> sortBy;
>> > ...
>> > }
>> >
>> > what I am not sure about is how to write
>> >
>> > readFields and write methods for this object. Any help would be
>> > appreciated.
>> >
>> > Thanks
>> > Adeel
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The easy way is to deserialize each stream into objects, then compare
them, pretty much what most of the defaults do.

Comparing without deserializing the whole stream is much faster and is
the point behind true RawComparators. Read
http://avro.apache.org/docs/current/spec.html#order for example.

On Sun, Sep 1, 2013 at 9:13 PM, Adeel Qureshi <ad...@gmail.com> wrote:
> Okay that makes sense .. so the same order I write is how I can read ..
> taking it a step further, in the compareto method, how can I use the bytes
> provided to do a comparison on let's say on a list object
>
> On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>> The idea behind write(…) and readFields(…) is simply that of ordering.
>> You need to write your custom objects (i.e. a representation of them)
>> in order, and read them back in the same order.
>>
>> An example way of serializing a list would be to first serialize the
>> length (so you know how many you'll be needed to read back), and then
>> serialize each item appropriately, using delimiters, or using
>> length-prefixes just like lists.
>>
>> Mainly, you're required to tackle the serialization/deserialization on
>> your own.
>>
>> This is one of the reasons I highly recommend using a library like
>> Apache Avro instead. Its more powerful, faster, and yet simple to use:
>> http://avro.apache.org/docs/current/gettingstartedjava.html and
>> http://avro.apache.org/docs/current/mr.html. It is also popular and
>> carries first-grade support on several other hadoop-ecosystem
>> projects, such as Flume and Crunch.
>>
>> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
>> wrote:
>> > I want to write a custom writablecomparable object with two List objects
>> > within it ..
>> >
>> > public class CompositeKey implements WritableComparable {
>> >
>> > private List<JsonKey> groupBy;
>> > private List<JsonKey> sortBy;
>> > ...
>> > }
>> >
>> > what I am not sure about is how to write
>> >
>> > readFields and write methods for this object. Any help would be
>> > appreciated.
>> >
>> > Thanks
>> > Adeel
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Adeel Qureshi <ad...@gmail.com>.
Okay that makes sense .. so the same order I write is how I can read ..
taking it a step further, in the compareto method, how can I use the bytes
provided to do a comparison on let's say on a list object
On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:

> The idea behind write(…) and readFields(…) is simply that of ordering.
> You need to write your custom objects (i.e. a representation of them)
> in order, and read them back in the same order.
>
> An example way of serializing a list would be to first serialize the
> length (so you know how many you'll be needed to read back), and then
> serialize each item appropriately, using delimiters, or using
> length-prefixes just like lists.
>
> Mainly, you're required to tackle the serialization/deserialization on
> your own.
>
> This is one of the reasons I highly recommend using a library like
> Apache Avro instead. Its more powerful, faster, and yet simple to use:
> http://avro.apache.org/docs/current/gettingstartedjava.html and
> http://avro.apache.org/docs/current/mr.html. It is also popular and
> carries first-grade support on several other hadoop-ecosystem
> projects, such as Flume and Crunch.
>
> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
> wrote:
> > I want to write a custom writablecomparable object with two List objects
> > within it ..
> >
> > public class CompositeKey implements WritableComparable {
> >
> > private List<JsonKey> groupBy;
> > private List<JsonKey> sortBy;
> > ...
> > }
> >
> > what I am not sure about is how to write
> >
> > readFields and write methods for this object. Any help would be
> appreciated.
> >
> > Thanks
> > Adeel
>
>
>
> --
> Harsh J
>

Re: custom writablecomparable with complex fields

Posted by Adeel Qureshi <ad...@gmail.com>.
Okay that makes sense .. so the same order I write is how I can read ..
taking it a step further, in the compareto method, how can I use the bytes
provided to do a comparison on let's say on a list object
On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:

> The idea behind write(…) and readFields(…) is simply that of ordering.
> You need to write your custom objects (i.e. a representation of them)
> in order, and read them back in the same order.
>
> An example way of serializing a list would be to first serialize the
> length (so you know how many you'll be needed to read back), and then
> serialize each item appropriately, using delimiters, or using
> length-prefixes just like lists.
>
> Mainly, you're required to tackle the serialization/deserialization on
> your own.
>
> This is one of the reasons I highly recommend using a library like
> Apache Avro instead. Its more powerful, faster, and yet simple to use:
> http://avro.apache.org/docs/current/gettingstartedjava.html and
> http://avro.apache.org/docs/current/mr.html. It is also popular and
> carries first-grade support on several other hadoop-ecosystem
> projects, such as Flume and Crunch.
>
> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
> wrote:
> > I want to write a custom writablecomparable object with two List objects
> > within it ..
> >
> > public class CompositeKey implements WritableComparable {
> >
> > private List<JsonKey> groupBy;
> > private List<JsonKey> sortBy;
> > ...
> > }
> >
> > what I am not sure about is how to write
> >
> > readFields and write methods for this object. Any help would be
> appreciated.
> >
> > Thanks
> > Adeel
>
>
>
> --
> Harsh J
>

Re: custom writablecomparable with complex fields

Posted by Adeel Qureshi <ad...@gmail.com>.
Okay that makes sense .. so the same order I write is how I can read ..
taking it a step further, in the compareto method, how can I use the bytes
provided to do a comparison on let's say on a list object
On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:

> The idea behind write(…) and readFields(…) is simply that of ordering.
> You need to write your custom objects (i.e. a representation of them)
> in order, and read them back in the same order.
>
> An example way of serializing a list would be to first serialize the
> length (so you know how many you'll be needed to read back), and then
> serialize each item appropriately, using delimiters, or using
> length-prefixes just like lists.
>
> Mainly, you're required to tackle the serialization/deserialization on
> your own.
>
> This is one of the reasons I highly recommend using a library like
> Apache Avro instead. Its more powerful, faster, and yet simple to use:
> http://avro.apache.org/docs/current/gettingstartedjava.html and
> http://avro.apache.org/docs/current/mr.html. It is also popular and
> carries first-grade support on several other hadoop-ecosystem
> projects, such as Flume and Crunch.
>
> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
> wrote:
> > I want to write a custom writablecomparable object with two List objects
> > within it ..
> >
> > public class CompositeKey implements WritableComparable {
> >
> > private List<JsonKey> groupBy;
> > private List<JsonKey> sortBy;
> > ...
> > }
> >
> > what I am not sure about is how to write
> >
> > readFields and write methods for this object. Any help would be
> appreciated.
> >
> > Thanks
> > Adeel
>
>
>
> --
> Harsh J
>

Re: custom writablecomparable with complex fields

Posted by Adeel Qureshi <ad...@gmail.com>.
Okay that makes sense .. so the same order I write is how I can read ..
taking it a step further, in the compareto method, how can I use the bytes
provided to do a comparison on let's say on a list object
On Aug 31, 2013 4:52 PM, "Harsh J" <ha...@cloudera.com> wrote:

> The idea behind write(…) and readFields(…) is simply that of ordering.
> You need to write your custom objects (i.e. a representation of them)
> in order, and read them back in the same order.
>
> An example way of serializing a list would be to first serialize the
> length (so you know how many you'll be needed to read back), and then
> serialize each item appropriately, using delimiters, or using
> length-prefixes just like lists.
>
> Mainly, you're required to tackle the serialization/deserialization on
> your own.
>
> This is one of the reasons I highly recommend using a library like
> Apache Avro instead. Its more powerful, faster, and yet simple to use:
> http://avro.apache.org/docs/current/gettingstartedjava.html and
> http://avro.apache.org/docs/current/mr.html. It is also popular and
> carries first-grade support on several other hadoop-ecosystem
> projects, such as Flume and Crunch.
>
> On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com>
> wrote:
> > I want to write a custom writablecomparable object with two List objects
> > within it ..
> >
> > public class CompositeKey implements WritableComparable {
> >
> > private List<JsonKey> groupBy;
> > private List<JsonKey> sortBy;
> > ...
> > }
> >
> > what I am not sure about is how to write
> >
> > readFields and write methods for this object. Any help would be
> appreciated.
> >
> > Thanks
> > Adeel
>
>
>
> --
> Harsh J
>

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The idea behind write(…) and readFields(…) is simply that of ordering.
You need to write your custom objects (i.e. a representation of them)
in order, and read them back in the same order.

An example way of serializing a list would be to first serialize the
length (so you know how many you'll be needed to read back), and then
serialize each item appropriately, using delimiters, or using
length-prefixes just like lists.

Mainly, you're required to tackle the serialization/deserialization on your own.

This is one of the reasons I highly recommend using a library like
Apache Avro instead. Its more powerful, faster, and yet simple to use:
http://avro.apache.org/docs/current/gettingstartedjava.html and
http://avro.apache.org/docs/current/mr.html. It is also popular and
carries first-grade support on several other hadoop-ecosystem
projects, such as Flume and Crunch.

On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com> wrote:
> I want to write a custom writablecomparable object with two List objects
> within it ..
>
> public class CompositeKey implements WritableComparable {
>
> private List<JsonKey> groupBy;
> private List<JsonKey> sortBy;
> ...
> }
>
> what I am not sure about is how to write
>
> readFields and write methods for this object. Any help would be appreciated.
>
> Thanks
> Adeel



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The idea behind write(…) and readFields(…) is simply that of ordering.
You need to write your custom objects (i.e. a representation of them)
in order, and read them back in the same order.

An example way of serializing a list would be to first serialize the
length (so you know how many you'll be needed to read back), and then
serialize each item appropriately, using delimiters, or using
length-prefixes just like lists.

Mainly, you're required to tackle the serialization/deserialization on your own.

This is one of the reasons I highly recommend using a library like
Apache Avro instead. Its more powerful, faster, and yet simple to use:
http://avro.apache.org/docs/current/gettingstartedjava.html and
http://avro.apache.org/docs/current/mr.html. It is also popular and
carries first-grade support on several other hadoop-ecosystem
projects, such as Flume and Crunch.

On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com> wrote:
> I want to write a custom writablecomparable object with two List objects
> within it ..
>
> public class CompositeKey implements WritableComparable {
>
> private List<JsonKey> groupBy;
> private List<JsonKey> sortBy;
> ...
> }
>
> what I am not sure about is how to write
>
> readFields and write methods for this object. Any help would be appreciated.
>
> Thanks
> Adeel



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The idea behind write(…) and readFields(…) is simply that of ordering.
You need to write your custom objects (i.e. a representation of them)
in order, and read them back in the same order.

An example way of serializing a list would be to first serialize the
length (so you know how many you'll be needed to read back), and then
serialize each item appropriately, using delimiters, or using
length-prefixes just like lists.

Mainly, you're required to tackle the serialization/deserialization on your own.

This is one of the reasons I highly recommend using a library like
Apache Avro instead. Its more powerful, faster, and yet simple to use:
http://avro.apache.org/docs/current/gettingstartedjava.html and
http://avro.apache.org/docs/current/mr.html. It is also popular and
carries first-grade support on several other hadoop-ecosystem
projects, such as Flume and Crunch.

On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com> wrote:
> I want to write a custom writablecomparable object with two List objects
> within it ..
>
> public class CompositeKey implements WritableComparable {
>
> private List<JsonKey> groupBy;
> private List<JsonKey> sortBy;
> ...
> }
>
> what I am not sure about is how to write
>
> readFields and write methods for this object. Any help would be appreciated.
>
> Thanks
> Adeel



-- 
Harsh J

Re: custom writablecomparable with complex fields

Posted by Harsh J <ha...@cloudera.com>.
The idea behind write(…) and readFields(…) is simply that of ordering.
You need to write your custom objects (i.e. a representation of them)
in order, and read them back in the same order.

An example way of serializing a list would be to first serialize the
length (so you know how many you'll be needed to read back), and then
serialize each item appropriately, using delimiters, or using
length-prefixes just like lists.

Mainly, you're required to tackle the serialization/deserialization on your own.

This is one of the reasons I highly recommend using a library like
Apache Avro instead. Its more powerful, faster, and yet simple to use:
http://avro.apache.org/docs/current/gettingstartedjava.html and
http://avro.apache.org/docs/current/mr.html. It is also popular and
carries first-grade support on several other hadoop-ecosystem
projects, such as Flume and Crunch.

On Sun, Sep 1, 2013 at 1:23 AM, Adeel Qureshi <ad...@gmail.com> wrote:
> I want to write a custom writablecomparable object with two List objects
> within it ..
>
> public class CompositeKey implements WritableComparable {
>
> private List<JsonKey> groupBy;
> private List<JsonKey> sortBy;
> ...
> }
>
> what I am not sure about is how to write
>
> readFields and write methods for this object. Any help would be appreciated.
>
> Thanks
> Adeel



-- 
Harsh J