You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/06/15 16:34:01 UTC

[jira] [Commented] (PIG-4594) Enable "TestMultiQuery" in spark mode

    [ https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586116#comment-14586116 ] 

liyunzhang_intel commented on PIG-4594:
---------------------------------------

Multiquery is automatically enabled. If you want to disable it,just using following command:
#pig -no_multiquery script

When POSplit is encountered in SparkCompiler#visitSplit, a new SparkOperator will be generated. For example:

{code}
testSplit.pig
A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
store x into './testSplit_x.out';
store y into './testSplit_y.out';
store z into './testSplit_z.out';
{code}

{code}
cat bin/testSplit.txt
1     2     3
4     5     6
7     8     9
{code}

PhysicalPlan
{code}
x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16
|
|---x: Filter[bag] - scope-12
    |   |
    |   Less Than[boolean] - scope-15
    |   |
    |   |---Project[int][0] - scope-13
    |   |
    |   |---Constant(7) - scope-14
    |
    |---1-1: Split - scope-11
        |
        |---A: New For Each(false,false,false)[bag] - scope-10
            |   |
            |   Cast[int] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Cast[int] - scope-5
            |   |
            |   |---Project[bytearray][1] - scope-4
            |   |
            |   Cast[int] - scope-8
            |   |
            |   |---Project[bytearray][2] - scope-7
            |
            |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0

y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21
|
|---y: Filter[bag] - scope-17
    |   |
    |   Equal To[boolean] - scope-20
    |   |
    |   |---Project[int][1] - scope-18
    |   |
    |   |---Constant(5) - scope-19
    |
    |---1-1: Split - scope-11
        |
        |---A: New For Each(false,false,false)[bag] - scope-10
            |   |
            |   Cast[int] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Cast[int] - scope-5
            |   |
            |   |---Project[bytearray][1] - scope-4
            |   |
            |   Cast[int] - scope-8
            |   |
            |   |---Project[bytearray][2] - scope-7
            |
            |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0

z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30
|
|---z: Filter[bag] - scope-22
    |   |
    |   Or[boolean] - scope-29
    |   |
    |   |---Less Than[boolean] - scope-25
    |   |   |
    |   |   |---Project[int][2] - scope-23
    |   |   |
    |   |   |---Constant(6) - scope-24
    |   |
    |   |---Greater Than[boolean] - scope-28
    |       |
    |       |---Project[int][2] - scope-26
    |       |
    |       |---Constant(6) - scope-27
    |
    |---1-1: Split - scope-11
        |
        |---A: New For Each(false,false,false)[bag] - scope-10
            |   |
            |   Cast[int] - scope-2
            |   |
            |   |---Project[bytearray][0] - scope-1
            |   |
            |   Cast[int] - scope-5
            |   |
            |   |---Project[bytearray][1] - scope-4
            |   |
            |   Cast[int] - scope-8
            |   |
            |   |---Project[bytearray][2] - scope-7
            |
            |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0

{code}

SparkPlan
{code}
before multiquery optimization:
scope-31->scope-34 scope-36 scope-38
scope-34
scope-36
scope-38
#--------------------------------------------------
# Spark Plan                                 
#--------------------------------------------------

Spark node scope-31
Store(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-32
|
|---A: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------

Spark node scope-34
x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16
|
|---x: Filter[bag] - scope-12
    |   |
    |   Less Than[boolean] - scope-15
    |   |
    |   |---Project[int][0] - scope-13
    |   |
    |   |---Constant(7) - scope-14
    |
    |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-33--------

Spark node scope-36
y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21
|
|---y: Filter[bag] - scope-17
    |   |
    |   Equal To[boolean] - scope-20
    |   |
    |   |---Project[int][1] - scope-18
    |   |
    |   |---Constant(5) - scope-19
    |
    |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-35--------

Spark node scope-38
z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30
|
|---z: Filter[bag] - scope-22
    |   |
    |   Or[boolean] - scope-29
    |   |
    |   |---Less Than[boolean] - scope-25
    |   |   |
    |   |   |---Project[int][2] - scope-23
    |   |   |
    |   |   |---Constant(6) - scope-24
    |   |
    |   |---Greater Than[boolean] - scope-28
    |       |
    |       |---Project[int][2] - scope-26
    |       |
    |       |---Constant(6) - scope-27
    |
    |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-37--------

{code}

After  multiquery optimization:
scope-39
{code}
Split - scope-39
|   |
|   x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16
|   |
|   |---x: Filter[bag] - scope-12
|       |   |
|       |   Less Than[boolean] - scope-15
|       |   |
|       |   |---Project[int][0] - scope-13
|       |   |
|       |   |---Constant(7) - scope-14
|   |
|   y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21
|   |
|   |---y: Filter[bag] - scope-17
|       |   |
|       |   Equal To[boolean] - scope-20
|       |   |
|       |   |---Project[int][1] - scope-18
|       |   |
|       |   |---Constant(5) - scope-19
|   |
|   z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30
|   |
|   |---z: Filter[bag] - scope-22
|       |   |
|       |   Or[boolean] - scope-29
|       |   |
|       |   |---Less Than[boolean] - scope-25
|       |   |   |
|       |   |   |---Project[int][2] - scope-23
|       |   |   |
|       |   |   |---Constant(6) - scope-24
|       |   |
|       |   |---Greater Than[boolean] - scope-28
|       |       |
|       |       |---Project[int][2] - scope-26
|       |       |
|       |       |---Constant(6) - scope-27
|
|---A: New For Each(false,false,false)[bag] - scope-10
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[int] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |
    |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0

{code}
In above case, MultiQueryOptimizerSpark will remove the unncesssary load(scope-33,scope-35,scope-37) and store(scope-32) and merge 4 spark nodes(scope-31,scope-34,scope-36,scope-38) to 1 spark node(scope-39).

In PIG-4594.patch, NoopFilterRemover.java is added:
NoopFilterRemover will remove the filters producing by the POSplit. for example:
NoopFilterRemove will remove scope-15,scope-29
Before NoopFilterRemove#visit is executed:
{code}
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-42
Store(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-43
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |   |
    |   Cast[int] - scope-11
    |   |
    |   |---Project[bytearray][3] - scope-10
    |
    |---a: Load(file:///home/zly/prj/oss/kellyzly/pig/pig-976.txt:org.apache.pig.builtin.PigStorage) - scope-0--------

Spark node scope-45
d: Store(file:///home/zly/prj/oss/kellyzly/pig/output1:org.apache.pig.builtin.PigStorage) - scope-28
|
|---d: New For Each(false,false)[bag] - scope-27
    |   |
    |   Project[int][0] - scope-21
    |   |
    |   POUserFunc(org.apache.pig.builtin.LongSum)[long] - scope-25
    |   |
    |   |---Project[bag][3] - scope-24
    |       |
    |       |---Project[bag][1] - scope-23
    |
    |---b: Package(Packager)[tuple]{int} - scope-18
        |
        |---b: Global Rearrange[tuple] - scope-17
            |
            |---b: Local Rearrange[tuple]{int}(false) - scope-19
                |   |
                |   Project[int][2] - scope-20
                |
                |---a: Filter[bag] - scope-15
                    |   |
                    |   Constant(true) - scope-16
                    |
                    |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-44--------

Spark node scope-47
e: Store(file:///home/zly/prj/oss/kellyzly/pig/output2:org.apache.pig.builtin.PigStorage) - scope-41
|
|---e: New For Each(false,false)[bag] - scope-40
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT)[long] - scope-36
    |   |
    |   |---Project[bag][1] - scope-35
    |   |
    |   Project[int][0] - scope-38
    |
    |---c: Package(Packager)[tuple]{int} - scope-32
        |
        |---c: Global Rearrange[tuple] - scope-31
            |
            |---c: Local Rearrange[tuple]{int}(false) - scope-33
                |   |
                |   Project[int][3] - scope-34
                |
                |---a: Filter[bag] - scope-29
                    |   |
                    |   Constant(true) - scope-30
                    |
                    |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-46--------

{code}

after NoopFilterRemove is executed:
{code}
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-42
Store(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-43
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
    |   |
    |   Cast[chararray] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |   |
    |   Cast[int] - scope-8
    |   |
    |   |---Project[bytearray][2] - scope-7
    |   |
    |   Cast[int] - scope-11
    |   |
    |   |---Project[bytearray][3] - scope-10
    |
    |---a: Load(file:///home/zly/prj/oss/kellyzly/pig/pig-976.txt:org.apache.pig.builtin.PigStorage) - scope-0--------

Spark node scope-45
d: Store(file:///home/zly/prj/oss/kellyzly/pig/output1:org.apache.pig.builtin.PigStorage) - scope-28
|
|---d: New For Each(false,false)[bag] - scope-27
    |   |
    |   Project[int][0] - scope-21
    |   |
    |   POUserFunc(org.apache.pig.builtin.LongSum)[long] - scope-25
    |   |
    |   |---Project[bag][3] - scope-24
    |       |
    |       |---Project[bag][1] - scope-23
    |
    |---b: Package(Packager)[tuple]{int} - scope-18
        |
        |---b: Global Rearrange[tuple] - scope-17
            |
            |---b: Local Rearrange[tuple]{int}(false) - scope-19
                |   |
                |   Project[int][2] - scope-20
                |
                |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-44--------

Spark node scope-47
e: Store(file:///home/zly/prj/oss/kellyzly/pig/output2:org.apache.pig.builtin.PigStorage) - scope-41
|
|---e: New For Each(false,false)[bag] - scope-40
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT)[long] - scope-36
    |   |
    |   |---Project[bag][1] - scope-35
    |   |
    |   Project[int][0] - scope-38
    |
    |---c: Package(Packager)[tuple]{int} - scope-32
        |
        |---c: Global Rearrange[tuple] - scope-31
            |
            |---c: Local Rearrange[tuple]{int}(false) - scope-33
                |   |
                |   Project[int][3] - scope-34
                |
                |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-46--------

{code}

In MultiQueryOptimizerSpark, we divde all the cases into 3 situations.
Here two concepts "splitter" and "splittee" are introduced. Splittee  stands for sparkOperator which contains POSplit. Splitter stands for  sparkOperator which is the successor of the splitter.
     1. If the size of predecessors of splittee is more than 1, then not do multiquery optimization. For example
          TestMultiQuery#testMultiQueryWithFJ_2:
          {code}
          a = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int);
          b = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int);
          c = filter a by uid > 5;
          store c into './testMultiQueryWithFJ_2.output1';
          d = filter b by gid > 10;
          store d into './testMultiQueryWithFJ_2.output2';
          e = join c by gid, d by gid using 'repl';
          store e into './testMultiQueryWithFJ_2.output3';
          {code}
         
          Scope-69 's predecesors are  are scope-57, scope-61,if we merge all the physical plan of scope-57,scope-61 to scope-69's physical plan and remove scope-57 and scope-61 as what shows in after multiquery optimization in the following .scope-60 and scope-64 will not find their predecessors(scope-57,scope-61).  Because of this, this kind of case can not be multiquery optimized.
          before multiquery optimization:
          {code}
          scope-57->scope-60 scope-69
          scope-60
          scope-61->scope-64 scope-69
          scope-64
          scope-69
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-57
          Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-58
          |
          |---c: Filter[bag] - scope-14
              |   |
              |   Greater Than[boolean] - scope-17
              |   |
              |   |---Project[int][2] - scope-15
              |   |
              |   |---Constant(5) - scope-16
              |
              |---a: New For Each(false,false,false,false)[bag] - scope-13
                  |   |
                  |   Cast[chararray] - scope-2
                  |   |
                  |   |---Project[bytearray][0] - scope-1
                  |   |
                  |   Cast[chararray] - scope-5
                  |   |
                  |   |---Project[bytearray][1] - scope-4
                  |   |
                  |   Cast[int] - scope-8
                  |   |
                  |   |---Project[bytearray][2] - scope-7
                  |   |
                  |   Cast[int] - scope-11
                  |   |
                  |   |---Project[bytearray][3] - scope-10
                  |
                  |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0--------
         
          Spark node scope-60
          c: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output1:org.apache.pig.builtin.PigStorage) - scope-21
          |
          |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-59--------
         
          Spark node scope-69
          e: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output3:org.apache.pig.builtin.PigStorage) - scope-56
          |
          |---e: FRJoin[tuple] - scope-50
              |   |
              |   Project[int][3] - scope-48
              |   |
              |   Project[int][3] - scope-49
              |
              |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-65
              |
              |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-67--------
         
          Spark node scope-61
          Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-62
          |
          |---d: Filter[bag] - scope-36
              |   |
              |   Greater Than[boolean] - scope-39
              |   |
              |   |---Project[int][3] - scope-37
              |   |
              |   |---Constant(10) - scope-38
              |
              |---b: New For Each(false,false,false,false)[bag] - scope-35
                  |   |
                  |   Cast[chararray] - scope-24
                  |   |
                  |   |---Project[bytearray][0] - scope-23
                  |   |
                  |   Cast[chararray] - scope-27
                  |   |
                  |   |---Project[bytearray][1] - scope-26
                  |   |
                  |   Cast[int] - scope-30
                  |   |
                  |   |---Project[bytearray][2] - scope-29
                  |   |
                  |   Cast[int] - scope-33
                  |   |
                  |   |---Project[bytearray][3] - scope-32
                  |
                  |---b: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-22--------
         
          Spark node scope-64
          d: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output2:org.apache.pig.builtin.PigStorage) - scope-43
          |
          |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-63--------
          {code}
         
          after multiquery optimization:
          {code}
          scope-60
          scope-64
          scope-69
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-60
          c: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output1:org.apache.pig.builtin.PigStorage) - scope-21
          |
          |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-59--------
         
          Spark node scope-69
          e: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output3:org.apache.pig.builtin.PigStorage) - scope-56
          |
          |---e: FRJoin[tuple] - scope-50
              |   |
              |   Project[int][3] - scope-48
              |   |
              |   Project[int][3] - scope-49
              |
              |---c: Filter[bag] - scope-14
              |   |
              |   Greater Than[boolean] - scope-17
              |   |
              |   |---Project[int][2] - scope-15
              |   |
              |   |---Constant(5) - scope-16
              |   |
		      |   |---a: New For Each(false,false,false,false)[bag] - scope-13
		      |   |      |
		      |   |      Cast[chararray] - scope-2
		      |   |      |
		      |   |      |---Project[bytearray][0] - scope-1
		      |   |      |
		      |   |      Cast[chararray] - scope-5
		      |   |      |
		      |   |      |---Project[bytearray][1] - scope-4
		      |   |      |
		      |   |      Cast[int] - scope-8
		      |   |      |
		      |   |      |---Project[bytearray][2] - scope-7
		      |   |      |
		      |   |      Cast[int] - scope-11
		      |   |      |
		      |   |      |---Project[bytearray][3] - scope-10
		      |   |
		      |   |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0--------
              |
              |---d: Filter[bag] - scope-36
                    |   |
                    |   Greater Than[boolean] - scope-39
                    |   |
                    |   |---Project[int][3] - scope-37
                    |   |
                    |   |---Constant(10) - scope-38
                    |
                    |---b: New For Each(false,false,false,false)[bag] - scope-35
                         |   |
                         |   Cast[chararray] - scope-24
                         |   |
                         |   |---Project[bytearray][0] - scope-23
                         |   |
                         |   Cast[chararray] - scope-27
                         |   |
                         |   |---Project[bytearray][1] - scope-26
                         |   |
                         |   Cast[int] - scope-30
                         |   |
                         |   |---Project[bytearray][2] - scope-29
                         |   |
                         |   Cast[int] - scope-33
                         |   |
                         |   |---Project[bytearray][3] - scope-32
                         |
                         |---b: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-22--------
         
         
         
          Spark node scope-64
          d: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output2:org.apache.pig.builtin.PigStorage) - scope-43
          |
          |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-63--------
          {code}
         
     2. If the size of splittee is 1:
        {code}
          A = load './testSplit.txt' as (f1:int, f2:int,f3:int);
          split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6);
          store x into './testSplit_x.out';
        {code}
          before multiquery optimization:
          {code}
          scope-17->scope-20
          scope-20
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-17
          Store(hdfs://zly2.sh.intel.com:8020/tmp/temp756348234/tmp748022356:org.apache.pig.impl.io.InterStorage) - scope-18
          |
          |---A: New For Each(false,false,false)[bag] - scope-10
              |   |
              |   Cast[int] - scope-2
              |   |
              |   |---Project[bytearray][0] - scope-1
              |   |
              |   Cast[int] - scope-5
              |   |
              |   |---Project[bytearray][1] - scope-4
              |   |
              |   Cast[int] - scope-8
              |   |
              |   |---Project[bytearray][2] - scope-7
              |
              |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
         
          Spark node scope-20
          x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16
          |
          |---x: Filter[bag] - scope-12
              |   |
              |   Less Than[boolean] - scope-15
              |   |
              |   |---Project[int][0] - scope-13
			  |   |
              |   |---Constant(7) - scope-14
              |
              |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp756348234/tmp748022356:org.apache.pig.impl.io.InterStorage) - scope-19--------
               {code}
         
          after multiquery optimization:
          {code}
          scope-17
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-17
          x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16
          |
          |---x: Filter[bag] - scope-12
              |   |
              |   Less Than[boolean] - scope-15
              |   |
              |   |---Project[int][0] - scope-13
              |   |
              |   |---Constant(7) - scope-14
              |
              |---A: New For Each(false,false,false)[bag] - scope-10
                  |   |
                  |   Cast[int] - scope-2
                  |   |
                  |   |---Project[bytearray][0] - scope-1
                  |   |
                  |   Cast[int] - scope-5
                  |   |
                  |   |---Project[bytearray][1] - scope-4
                  |   |
                  |   Cast[int] - scope-8
                  |   |
                  |   |---Project[bytearray][2] - scope-7
                  |
                  |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0--------
          {code}
         
     3. If the size of splittee is more than 1 and not in case1, we need create a split which type is POSplit, merge all the physical plans  of splittees to the physical plan of split and remove the splittees.
        {code}
          a = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int);
          b = filter a by uid < 5;
          store b into './multiquery.b.out';
          c = foreach b generate uname;
          store c into './multiquery.out';
          {code}
         
          In this case after multiquery optimization, we create POSplit (scope-34) and merge all the splittees to the sp
         
          before multiquery optimization:
{code}
          scope-28->scope-31 scope-33
          scope-31
          scope-33
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-28
          Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-29
          |
          |---b: Filter[bag] - scope-14
              |   |
              |   Less Than[boolean] - scope-17
              |   |
              |   |---Project[int][2] - scope-15
              |   |
              |   |---Constant(5) - scope-16
              |
              |---a: New For Each(false,false,false,false)[bag] - scope-13
                       |   |
                       |   Cast[chararray] - scope-2
                       |   |
					   |   |---Project[bytearray][0] - scope-1
					   |   |
					   |   Cast[chararray] - scope-5
					   |   |
                       |   |---Project[bytearray][1] - scope-4
                       |   |
                       |   Cast[int] - scope-8
                       |   |
                       |   |---Project[bytearray][2] - scope-7
                       |   |
                       |   Cast[int] - scope-11
                       |   |
                       |   |---Project[bytearray][3] - scope-10
                       |
                       |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0--------
         
          Spark node scope-31
          b: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.b.out:org.apache.pig.builtin.PigStorage) - scope-21
          |
          |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-30--------
         
          Spark node scope-33
          c: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.out:org.apache.pig.builtin.PigStorage) - scope-27
          |
          |---c: New For Each(false)[bag] - scope-26
              |   |
              |   Project[chararray][0] - scope-24
              |
              |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-32--------
 {code}
          after multiquery optimization:
          {code}
          scope-28
          #--------------------------------------------------
          # Spark Plan                                 
          #--------------------------------------------------
         
          Spark node scope-28
          Split - scope-34
          |   |
          |   b: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.b.out:org.apache.pig.builtin.PigStorage) - scope-21
          |   |
          |   c: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.out:org.apache.pig.builtin.PigStorage) - scope-27
          |   |
          |   |---c: New For Each(false)[bag] - scope-26
          |       |   |
          |       |   Project[chararray][0] - scope-24
          |
          |---b: Filter[bag] - scope-14
              |   |
              |   Less Than[boolean] - scope-17
              |   |
              |   |---Project[int][2] - scope-15
              |   |
              |   |---Constant(5) - scope-16
              |
              |---a: New For Each(false,false,false,false)[bag] - scope-13
                  |   |
                  |   Cast[chararray] - scope-2
                  |   |
                  |   |---Project[bytearray][0] - scope-1
                  |   |
                  |   Cast[chararray] - scope-5
                  |   |
                  |   |---Project[bytearray][1] - scope-4
                  |   |
                  |   Cast[int] - scope-8
                  |   |
                  |   |---Project[bytearray][2] - scope-7
                  |   |
                  |   Cast[int] - scope-11
                  |   |
                  |   |---Project[bytearray][3] - scope-10
                  |
                  |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0--------
{code}

> Enable "TestMultiQuery" in spark mode
> -------------------------------------
>
>                 Key: PIG-4594
>                 URL: https://issues.apache.org/jira/browse/PIG-4594
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4594.patch
>
>
> in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows that 
> following unit test failures fail:
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)