You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2014/03/04 04:16:21 UTC
[jira] [Updated] (PIG-3795) Parallelism specified by user is not
honored if default parallelism is set to a higher value
[ https://issues.apache.org/jira/browse/PIG-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3795:
-------------------------------
Attachment: PIG-3795-1.patch
The attached patch changes two things-
# Remove the "parallelism overwrite" code from TezCompiler.
# Explicitly set the parallelism of TezOperator in visitPOGlobalRearrange().
I think there is no reason to overwrite the parallelism of vertex as long as we explicitly set it for every blocking operator. Please let me know if you think otherwise.
> Parallelism specified by user is not honored if default parallelism is set to a higher value
> --------------------------------------------------------------------------------------------
>
> Key: PIG-3795
> URL: https://issues.apache.org/jira/browse/PIG-3795
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Affects Versions: tez-branch
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3795-1.patch
>
>
> Let's say you have a query like this-
> {code}
> set default_parallel 200;
> x = cogroup foo by a, bar by b parallel 10;
> y = join x by c, z by d;
> {code}
> I would expect that cogroup has a parallel of 10 while join has a parallel of 200. However, the parallel of cogroup is also set to 200.
> Here is where the default parallelism overwrites the user-specified parallelism.
> {code:title=TezCompiler.java#L390}
> if (op.getRequestedParallelism() > curTezOp.getRequestedParallelism()) {
> curTezOp.setRequestedParallelism(op.getRequestedParallelism());
> }
> {code}
> In the above example, "op" is POLocalRearrange of join, and "curTezOp" is TezOperator that contains both POPackage of cogroup and POLocalRearrange of join.
> Here is what the TezOperator looks like-
> {code}
> | join_allocs_mop: Local Rearrange[tuple]{long}(false) - scope-134 -> null
> | | |
> | | Project[long][10] - scope-135
> | |
> | |---join_allocs_subscrn: New For Each(true)[bag] - scope-75
> | | |
> | | POUserFunc(org.apache.pig.scripting.jython.JythonFunction)[bag] - scope-70
> | | |
> | | |---POUserFunc(org.apache.pig.builtin.TOTUPLE)[tuple] - scope-69
> | | |
> | | |---Project[bag][0] - scope-67
> | | |
> | | |---RelationToExpressionProject[bag][*] - scope-68
> | | |
> | | |---ab_exp_63_day_subscrn_d_ordered: POSort[bag]() - scope-74
> | | | |
> | | | Project[chararray][9] - scope-73
> | | |
> | | |---Project[bag][1] - scope-72
> | |
> | |---New For Each(false,false)[bag] - scope-66
> | | |
> | | Project[bag][1] - scope-62
> | | |
> | | Project[bag][2] - scope-64
> | |
> | |---abNonmemberByCustomer: Package(Packager)[tuple]{long} - scope-57
> {code}
> The problem is that the parallelism of root (POPackage) is overwritten by that of leaves (POLocalRearrange) because the latter (200) > the former (10).
--
This message was sent by Atlassian JIRA
(v6.2#6252)