You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/08/26 22:59:59 UTC

[jira] [Commented] (TEZ-1499) Add OrderedJoinExample to tez-examples

    [ https://issues.apache.org/jira/browse/TEZ-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111326#comment-14111326 ] 

Bikas Saha commented on TEZ-1499:
---------------------------------

I was planning to add this variation to the same join example. It mainly involves changing the edge type. The processor would still load one side into a hash table (though it could then do it on a per key basis).
Keeping it in the same example allows users to see the variations in the same code. So l would recommend adding this as a variation to the same example.

> Add OrderedJoinExample to tez-examples
> --------------------------------------
>
>                 Key: TEZ-1499
>                 URL: https://issues.apache.org/jira/browse/TEZ-1499
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> In the current join example, the inputs of JoinProcessor is unordered so that it will always need to load one input into memory, and stream another input. This only fit for the case when one dataset is small enough to fit into memory ( even use no-broadcast, memory may not be enough ).  So I'd like to add another join example that make the inputs of JoinProcessor is ordered. ( using OrderedPartitionedKVEdgeConfig ). This kind of join could been used when both of the 2 datasets are large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)