You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/04/28 03:39:05 UTC

[jira] [Created] (PIG-4522) Remove unnecessary store and load when POSplit is encounted

liyunzhang_intel created PIG-4522:
-------------------------------------

             Summary: Remove unnecessary store and load when POSplit is encounted
                 Key: PIG-4522
                 URL: https://issues.apache.org/jira/browse/PIG-4522
             Project: Pig
          Issue Type: Sub-task
            Reporter: liyunzhang_intel
            Assignee: liyunzhang_intel


pig script:
{code}
A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
store x into './testSplit_x.out';
store y into './testSplit_y.out';
store z into './testSplit_z.out';
explain x; 
explain y;
explain z;
{code}

spark plan:
{code}
#The Spark node relations are:
#-----------------------------------------------------#
scope-17->scope-20 
scope-20
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-17
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-18
|
|---A: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------

Spark node scope-20
x: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-16
|
|---x: Filter[bag] - scope-12
    |   |
    |   Less Than[boolean] - scope-15
    |   |
    |   |---Project[int][0] - scope-13
    |   |
    |   |---Constant(7) - scope-14
    |
    |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-1477385839:org.apache.pig.impl.io.InterStorage) - scope-19--------

#-----------------------------------------------------#
#The Spark node relations are:
#-----------------------------------------------------#
scope-38->scope-41 
scope-41
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-38
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-39
|
|---A: New For Each(false,false,false)[bag] - scope-31
    |   |
    |   Cast[int] - scope-23
    |   |
    |   |---Project[bytearray][0] - scope-22
    |   |
    |   Cast[int] - scope-26
    |   |
    |   |---Project[bytearray][1] - scope-25
    |   |
    |   Cast[int] - scope-29
    |   |
    |   |---Project[bytearray][2] - scope-28
    |
    |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-21--------

Spark node scope-41
y: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-37
|
|---y: Filter[bag] - scope-33
    |   |
    |   Equal To[boolean] - scope-36
    |   |
    |   |---Project[int][1] - scope-34
    |   |
    |   |---Constant(5) - scope-35
    |
    |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp-918933337:org.apache.pig.impl.io.InterStorage) - scope-40--------

#-----------------------------------------------------#
#The Spark node relations are:
#-----------------------------------------------------#
scope-63->scope-66 
scope-66
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-63
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-64
|
|---A: New For Each(false,false,false)[bag] - scope-52
    |   |
    |   Cast[int] - scope-44
    |   |
    |   |---Project[bytearray][0] - scope-43
    |   |
    |   Cast[int] - scope-47
    |   |
    |   |---Project[bytearray][1] - scope-46
    |   |
    |   Cast[int] - scope-50
    |   |
    |   |---Project[bytearray][2] - scope-49
    |
    |---A: Load(hdfs://zly1.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-42--------

Spark node scope-66
z: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-62
|
|---z: Filter[bag] - scope-54
    |   |
    |   Or[boolean] - scope-61
    |   |
    |   |---Less Than[boolean] - scope-57
    |   |   |
    |   |   |---Project[int][2] - scope-55
    |   |   |
    |   |   |---Constant(6) - scope-56
    |   |
    |   |---Greater Than[boolean] - scope-60
    |       |
    |       |---Project[int][2] - scope-58
    |       |
    |       |---Constant(6) - scope-59
    |
    |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1920285848/tmp1444529161:org.apache.pig.impl.io.InterStorage) - scope-65--------
{code}

Scope-18(Store) and Scope-19(Load)  is not necessary. It should be removed.  

Scope-39(Store) and Scope-40(Load)  is not necessary. It should be removed.  

Scope-64(Store) and Scope-65(Load) is not necessary. It should be removed.  





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)