You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2009/10/06 11:52:31 UTC

[jira] Commented: (MAPREDUCE-1066) Add a unit test to test all the apis in mapreduce.lib.join

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762571#action_12762571 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1066:
----------------------------------------------------

Chris Douglas has suggested the following unit test to test join.

Given three sorted, equally partitioned datasets { A, B, C }.
* For each source let all values be a unique prime p 
* Define an operator "count" derived from MultiFilterRecordReader.
* The correct output of output of count(A,B,C) should be [ p0^i, p1^j, p2^k ] such that i, j, k are the number of values in each source iterator. The output for a given position in the tuple is the product of all the values it receives (or 1 if it does not contain that key).

e.g. Given count(A,B,C) with the following keys, and values (2, 3, 5)
for all records, respectively
A = k0, k0, k0, k1, k2
B = k0, k1
C = k0, k0, k2, k2

The output would be 600 = (23 * 31 * 52), 6 = (21 * 31 * 50),
and 50 = (21 * 30 * 52) from the following trace:
(k0, [ 2, 3, 5 ])
(k0, [ 2, 3, 25 ])
(k0, [ 4, 3, 5 ])
(k0, [ 4, 3, 25])
(k0, [ 8, 3, 5])
(k0, [ 8, 3, 25])
(k1, [ 2, 3, 1])
(k2, [ 2, 1, 5])
(k2, [ 2, 1, 25])

Run a job with identity map and a combiner/reducer that computes the product of the values. Verify that this matches the output of count(A, B, C). 
Alternatively, add a unary operator mult(A) that computes the product of its values for a given key and verify that it matches outer(mult(A), mult(B), mult(C)).


> Add a unit test  to test all the apis in mapreduce.lib.join 
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-1066
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1066
>             Project: Hadoop Map/Reduce
>          Issue Type: Test
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>
> Add a unit test  to test all the api/features in mapreduce.lib.join 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.