You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Vikas Jadhav <vi...@gmail.com> on 2013/04/23 07:44:30 UTC

Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Hi

how to sort value in hadoop using standard sorting algorithm of hadoop (
i.e sorting facility provided by hadoop)

Requirement:

1) Values shoulde be sorted depending on some part of value

For Exam     (KEY,VALUE)

 (0,"BC,4,XY')
 (1,"DC,1,PQ")
 (2,"EF,0,MN")

Sorted sequence @ reduce reached should be

(2,"EF,0,MN")
(1,"DC,1,PQ")
(0,"BC,4,XY')

Here sorted depending on second attribute postion in value.

Thanks



-- **
*

  Regards,
*
*   Vikas *

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Kai Voigt <k...@123.org>.

Hello,

the design pattern here is to emit the component you want to sort by (second field of your value in your case) as the key in the map phase.

If you also want to keep the sorting by the original key, you need to emit a composite key, consisting of your original key and that part of the value. This technique is called the secondary sort.

Kai

Am 23.04.2013 um 07:44 schrieb Vikas Jadhav <vi...@gmail.com>:

> Hi
>  
> how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
>  
> Requirement:
>  
> 1) Values shoulde be sorted depending on some part of value
>  
> For Exam     (KEY,VALUE)
>  
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>  
> Sorted sequence @ reduce reached should be
>  
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>  
> Here sorted depending on second attribute postion in value.
>  
> Thanks
>  
> 
> 
> --
> 
>   Regards,
>    Vikas

-- 
Kai Voigt
k@123.org

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Vikas Jadhav <vi...@gmail.com>.

Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.




On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
<ge...@yahoo.com>wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group <key,
> list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>                           int numReduceTasks) {
>     if (key instanceof MyCompositeFieldWritableComparable)
>          return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
>     else
>         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   ------------------------------
>  *From:* Vikas Jadhav <vi...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam     (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Vikas Jadhav <vi...@gmail.com>.

Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.




On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
<ge...@yahoo.com>wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group <key,
> list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>                           int numReduceTasks) {
>     if (key instanceof MyCompositeFieldWritableComparable)
>          return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
>     else
>         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   ------------------------------
>  *From:* Vikas Jadhav <vi...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam     (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Vikas Jadhav <vi...@gmail.com>.

Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.




On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
<ge...@yahoo.com>wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group <key,
> list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>                           int numReduceTasks) {
>     if (key instanceof MyCompositeFieldWritableComparable)
>          return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
>     else
>         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   ------------------------------
>  *From:* Vikas Jadhav <vi...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam     (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Vikas Jadhav <vi...@gmail.com>.

Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.




On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
<ge...@yahoo.com>wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group <key,
> list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>                           int numReduceTasks) {
>     if (key instanceof MyCompositeFieldWritableComparable)
>          return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
>     else
>         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   ------------------------------
>  *From:* Vikas Jadhav <vi...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam     (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Sofia Georgiakaki <ge...@yahoo.com>.

Hello,

Sorting is done by the SortingComparator which performs sorting based on the value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends WritableComparable (lets call it MyCompositeFieldWritableComparable), that will store your current key and the part of the value that you want your sorting to be based on. As I understand from your description, this writable class will have 2 IntWritable fields, e.g
(FieldA, fieldB)

(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to override the defaults. Sorting before the reduce phase will be performed based on the compareTo() implementation of your custom writable, so you can write it in a way that will compare only fieldB. 

Be careful in the way you will implement methods MyCompositeFieldWritableComparable.equals() -it will be used to group <key, list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo() and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>________________________________
> From: Vikas Jadhav <vi...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam     (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Sofia Georgiakaki <ge...@yahoo.com>.

Hello,

Sorting is done by the SortingComparator which performs sorting based on the value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends WritableComparable (lets call it MyCompositeFieldWritableComparable), that will store your current key and the part of the value that you want your sorting to be based on. As I understand from your description, this writable class will have 2 IntWritable fields, e.g
(FieldA, fieldB)

(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to override the defaults. Sorting before the reduce phase will be performed based on the compareTo() implementation of your custom writable, so you can write it in a way that will compare only fieldB. 

Be careful in the way you will implement methods MyCompositeFieldWritableComparable.equals() -it will be used to group <key, list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo() and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>________________________________
> From: Vikas Jadhav <vi...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam     (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Kai Voigt <k...@123.org>.

Hello,

the design pattern here is to emit the component you want to sort by (second field of your value in your case) as the key in the map phase.

If you also want to keep the sorting by the original key, you need to emit a composite key, consisting of your original key and that part of the value. This technique is called the secondary sort.

Kai

Am 23.04.2013 um 07:44 schrieb Vikas Jadhav <vi...@gmail.com>:

> Hi
>  
> how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
>  
> Requirement:
>  
> 1) Values shoulde be sorted depending on some part of value
>  
> For Exam     (KEY,VALUE)
>  
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>  
> Sorted sequence @ reduce reached should be
>  
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>  
> Here sorted depending on second attribute postion in value.
>  
> Thanks
>  
> 
> 
> --
> 
>   Regards,
>    Vikas

-- 
Kai Voigt
k@123.org

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Kai Voigt <k...@123.org>.

Hello,

the design pattern here is to emit the component you want to sort by (second field of your value in your case) as the key in the map phase.

If you also want to keep the sorting by the original key, you need to emit a composite key, consisting of your original key and that part of the value. This technique is called the secondary sort.

Kai

Am 23.04.2013 um 07:44 schrieb Vikas Jadhav <vi...@gmail.com>:

> Hi
>  
> how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
>  
> Requirement:
>  
> 1) Values shoulde be sorted depending on some part of value
>  
> For Exam     (KEY,VALUE)
>  
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>  
> Sorted sequence @ reduce reached should be
>  
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>  
> Here sorted depending on second attribute postion in value.
>  
> Thanks
>  
> 
> 
> --
> 
>   Regards,
>    Vikas

-- 
Kai Voigt
k@123.org

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Kai Voigt <k...@123.org>.

Hello,

the design pattern here is to emit the component you want to sort by (second field of your value in your case) as the key in the map phase.

If you also want to keep the sorting by the original key, you need to emit a composite key, consisting of your original key and that part of the value. This technique is called the secondary sort.

Kai

Am 23.04.2013 um 07:44 schrieb Vikas Jadhav <vi...@gmail.com>:

> Hi
>  
> how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
>  
> Requirement:
>  
> 1) Values shoulde be sorted depending on some part of value
>  
> For Exam     (KEY,VALUE)
>  
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>  
> Sorted sequence @ reduce reached should be
>  
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>  
> Here sorted depending on second attribute postion in value.
>  
> Thanks
>  
> 
> 
> --
> 
>   Regards,
>    Vikas

-- 
Kai Voigt
k@123.org

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Sofia Georgiakaki <ge...@yahoo.com>.

Hello,

Sorting is done by the SortingComparator which performs sorting based on the value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends WritableComparable (lets call it MyCompositeFieldWritableComparable), that will store your current key and the part of the value that you want your sorting to be based on. As I understand from your description, this writable class will have 2 IntWritable fields, e.g
(FieldA, fieldB)

(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to override the defaults. Sorting before the reduce phase will be performed based on the compareTo() implementation of your custom writable, so you can write it in a way that will compare only fieldB. 

Be careful in the way you will implement methods MyCompositeFieldWritableComparable.equals() -it will be used to group <key, list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo() and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>________________________________
> From: Vikas Jadhav <vi...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam     (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>

Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

Posted by Sofia Georgiakaki <ge...@yahoo.com>.

Hello,

Sorting is done by the SortingComparator which performs sorting based on the value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends WritableComparable (lets call it MyCompositeFieldWritableComparable), that will store your current key and the part of the value that you want your sorting to be based on. As I understand from your description, this writable class will have 2 IntWritable fields, e.g
(FieldA, fieldB)

(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to override the defaults. Sorting before the reduce phase will be performed based on the compareTo() implementation of your custom writable, so you can write it in a way that will compare only fieldB. 

Be careful in the way you will implement methods MyCompositeFieldWritableComparable.equals() -it will be used to group <key, list(values)> in the reducer-, MyCompositeFieldWritableComparable.compareTo() and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
                          int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>________________________________
> From: Vikas Jadhav <vi...@gmail.com>
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam     (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>