You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/01/14 03:44:39 UTC
[jira] [Updated] (PIG-4522) Remove unnecessary store and load when
POSplit is encounted
[ https://issues.apache.org/jira/browse/PIG-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4522:
----------------------------------
Release Note: (was: This feature has been fixed in PIG-4594)
This feature has been fixed in PIG-4594
> Remove unnecessary store and load when POSplit is encounted
> -----------------------------------------------------------
>
> Key: PIG-4522
> URL: https://issues.apache.org/jira/browse/PIG-4522
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4522.patch
>
>
> pig script:
> {code}
> A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
> split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
> store x into './testSplit_x.out';
> store y into './testSplit_y.out';
> store z into './testSplit_z.out';
> explain x;
> explain y;
> explain z;
> {code}
> spark plan:
> {code}
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-17->scope-20
> scope-20
> #--------------------------------------------------
> # Spark Plan
> #--------------------------------------------------
> Spark node scope-17
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-18
> |
> |---A: New For Each(false,false,false)[bag] - scope-10
> | |
> | Cast[int] - scope-2
> | |
> | |---Project[bytearray][0] - scope-1
> | |
> | Cast[int] - scope-5
> | |
> | |---Project[bytearray][1] - scope-4
> | |
> | Cast[int] - scope-8
> | |
> | |---Project[bytearray][2] - scope-7
> |
> |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
> Spark node scope-20
> x: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-16
> |
> |---x: Filter[bag] - scope-12
> | |
> | Less Than[boolean] - scope-15
> | |
> | |---Project[int][0] - scope-13
> | |
> | |---Constant(7) - scope-14
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-19--------
> #-----------------------------------------------------#
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-38->scope-41
> scope-41
> #--------------------------------------------------
> # Spark Plan
> #--------------------------------------------------
> Spark node scope-38
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-39
> |
> |---A: New For Each(false,false,false)[bag] - scope-31
> | |
> | Cast[int] - scope-23
> | |
> | |---Project[bytearray][0] - scope-22
> | |
> | Cast[int] - scope-26
> | |
> | |---Project[bytearray][1] - scope-25
> | |
> | Cast[int] - scope-29
> | |
> | |---Project[bytearray][2] - scope-28
> |
> |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-21--------
> Spark node scope-41
> y: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-37
> |
> |---y: Filter[bag] - scope-33
> | |
> | Equal To[boolean] - scope-36
> | |
> | |---Project[int][1] - scope-34
> | |
> | |---Constant(5) - scope-35
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-40--------
> #-----------------------------------------------------#
> #The Spark node relations are:
> #-----------------------------------------------------#
> scope-63->scope-66
> scope-66
> #--------------------------------------------------
> # Spark Plan
> #--------------------------------------------------
> Spark node scope-63
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-64
> |
> |---A: New For Each(false,false,false)[bag] - scope-52
> | |
> | Cast[int] - scope-44
> | |
> | |---Project[bytearray][0] - scope-43
> | |
> | Cast[int] - scope-47
> | |
> | |---Project[bytearray][1] - scope-46
> | |
> | Cast[int] - scope-50
> | |
> | |---Project[bytearray][2] - scope-49
> |
> |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-42--------
> Spark node scope-66
> z: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-62
> |
> |---z: Filter[bag] - scope-54
> | |
> | Or[boolean] - scope-61
> | |
> | |---Less Than[boolean] - scope-57
> | | |
> | | |---Project[int][2] - scope-55
> | | |
> | | |---Constant(6) - scope-56
> | |
> | |---Greater Than[boolean] - scope-60
> | |
> | |---Project[int][2] - scope-58
> | |
> | |---Constant(6) - scope-59
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-65--------
> {code}
> Scope-18(Store) and Scope-19(Load) is not necessary. It should be removed.
> Scope-39(Store) and Scope-40(Load) is not necessary. It should be removed.
> Scope-64(Store) and Scope-65(Load) is not necessary. It should be removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)