You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Klaas Bosteels (JIRA)" <ji...@apache.org> on 2009/03/27 12:02:51 UTC

[jira] Issue Comment Edited: (HADOOP-5528) Binary partitioner

    [ https://issues.apache.org/jira/browse/HADOOP-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688268#action_12688268 ] 

Klaas Bosteels edited comment on HADOOP-5528 at 3/27/09 4:01 AM:
-----------------------------------------------------------------

The revised patch allows the subarray to be defined by means of Python-style offsets:

* {{mapred.binary.partitioner.left.offset}}: left Python-style offset in array
* {{mapred.binary.partitioner.right.offset}}: right Python-style offset in array

The best way to remember how these offsets work is by thinking of them as indices pointing between the array elements, with the left edge of the first element numbered 0, e.g.:

{code}
. +---+---+---+---+---+
  | B | B | B | B | B |
  +---+---+---+---+---+
  0   1   2   3   4   5
 -5  -4  -3  -2  -1
{code}

 The first row of numbers gives the position of the offsets 0...5 in  the array; the second row gives the corresponding negative offsets. When _i_ and _j_ are specified as left and right offset, respectively, then all bytes between the edges labeled _i_ and _j_ are taken into account for the partitioning.
 
More generally, the indexing logic can now be customized by specifying the {{BinaryPartitioner.Indexer}} classes to be used via the following properties:

* {{mapred.binary.partitioner.left.indexer.class}}
* {{mapred.binary.partitioner.right.indexer.class}}

By default, {{FirstIndexer}} and {{LastIndexer}} are used (i.e. the whole byte array is taken into account for the hashing), and the offset properties trigger the usage of {{PosOffsetIndexer}} and/or {{NegOffsetIndexer}}, which implement the indexing by means of Python-style offsets.

      was (Author: klbostee):
    The revised patch allows the subarray to be defined by means of Python-style offsets:

* {{mapred.binary.partitioner.left.offset}}: left Python-style offset in array
* {{mapred.binary.partitioner.right.offset}}: right Python-style offset in array

As indicated by Owen, the best way to remember how these offsets work is by thinking of them as indices pointing between the array elements, with the left edge of the first element numbered 0, e.g.:

{code}
. +---+---+---+---+---+
  | B | B | B | B | B |
  +---+---+---+---+---+
  0   1   2   3   4   5
 -5  -4  -3  -2  -1
{code}

 The first row of numbers gives the position of the offsets 0...5 in  the array; the second row gives the corresponding negative offsets. When _i_ and _j_ are specified as left and right offset, respectively, then all bytes between the edges labeled _i_ and _j_ are taken into account for the partitioning.
 
More generally, the indexing logic can now be customized by specifying the {{BinaryPartitioner.Indexer}} classes to be used via the following properties:

* {{mapred.binary.partitioner.left.indexer.class}}
* {{mapred.binary.partitioner.right.indexer.class}}

By default, {{FirstIndexer}} and {{LastIndexer}} are used (i.e. the whole byte array is taken into account for the hashing), and the offset properties trigger the usage of {{PosOffsetIndexer}} and/or {{NegOffsetIndexer}}, which implement the indexing by means of Python-style offsets.
  
> Binary partitioner
> ------------------
>
>                 Key: HADOOP-5528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5528
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Klaas Bosteels
>            Assignee: Klaas Bosteels
>         Attachments: HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch, HADOOP-5528.patch
>
>
> It would be useful to have a {{BinaryPartitioner}} that partitions {{BinaryComparable}} keys by hashing a configurable part of the bytes array corresponding to each key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.