You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/05/18 08:48:12 UTC

[jira] [Commented] (PIG-4898) Fix unit test failure after PIG-4771's patch was checked in

    [ https://issues.apache.org/jira/browse/PIG-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288630#comment-15288630 ] 

liyunzhang_intel commented on PIG-4898:
---------------------------------------

the reason why following unit tests fail is
1. org.apache.pig.test.TestFRJoin.testDistinctFRJoin	
2. org.apache.pig.test.TestPigRunner.simpleMultiQueryTest3

#1 is because we missing following code thus throw NPE.
SparkCompiler#visitDistinct
{code}
public void visitDistinct(PODistinct op) throws VisitorException {
                try {
                        addToPlan(op);
+            phyToSparkOpMap.put(op, curSparkOp);
                } catch (Exception e) {
                        int errCode = 2034;
    ....
{code}

2# is because now we don't replace FRJoin with regular join.  
TestPigRunner#simpleMultiQueryTest3.pig
{code}
A = load '" + INPUT_FILE + "' as (a0:int, a1:int, a2:int);
A1 = load '" + INPUT_FILE_2 + "' as (a0:int, a1:int, a2:int);
B = filter A by a0 == 3;
C = filter A by a1 <=5;
D = join C by a0, B by a0, A1 by a0 using 'replicated';
store C into '" + OUTPUT_FILE;
store D into '" + OUTPUT_FILE_2 
{code}

before when we use regular join to implement it, the spark plan is
{noformat}
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-521
Store(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-522
|
|---A: New For Each(false,false,false)[bag] - scope-478
    |   |
    |   Cast[int] - scope-470
    |   |
    |   |---Project[bytearray][0] - scope-469
    |   |
    |   Cast[int] - scope-473
    |   |
    |   |---Project[bytearray][1] - scope-472
    |   |
    |   Cast[int] - scope-476
    |   |
    |   |---Project[bytearray][2] - scope-475
    |
    |---A: Load(hdfs://localhost:59787/user/root/input:org.apache.pig.builtin.PigStorage) - scope-468--------

Spark node scope-524
Store(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-525
|
|---C: Filter[bag] - scope-482
    |   |
    |   Less Than or Equal[boolean] - scope-485
    |   |
    |   |---Project[int][1] - scope-483
    |   |
    |   |---Constant(5) - scope-484
    |
    |---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-523--------

Spark node scope-527
C: Store(hdfs://localhost:59787/user/root/output:org.apache.pig.builtin.PigStorage) - scope-489
|
|---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-526--------

Spark node scope-533
D: Store(hdfs://localhost:59787/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-520
|
|---D: FRJoin[tuple] - scope-512
    |   |
    |   Project[int][0] - scope-509
    |   |
    |   Project[int][0] - scope-510
    |   |
    |   Project[int][0] - scope-511
    |
    |---B: Filter[bag] - scope-494
    |   |   |
    |   |   Equal To[boolean] - scope-497
    |   |   |
    |   |   |---Project[int][0] - scope-495
    |   |   |
    |   |   |---Constant(3) - scope-496
    |   |
    |   |---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-530
    |
    |---A1: New For Each(false,false,false)[bag] - scope-508
    |   |   |
    |   |   Cast[int] - scope-500
    |   |   |
    |   |   |---Project[bytearray][0] - scope-499
    |   |   |
    |   |   Cast[int] - scope-503
    |   |   |
    |   |   |---Project[bytearray][1] - scope-502
    |   |   |
    |   |   Cast[int] - scope-506
    |   |   |
    |   |   |---Project[bytearray][2] - scope-505
    |   |
    |   |---A1: Load(hdfs://localhost:59787/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-498
    |
    |---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-528--------

	{noformat}
After PIG-4771
{code}
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-534
Split - scope-548
|   |
|   Store(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-538
|   |
|   |---C: Filter[bag] - scope-495
|       |   |
|       |   Less Than or Equal[boolean] - scope-498
|       |   |
|       |   |---Project[int][1] - scope-496
|       |   |
|       |   |---Constant(5) - scope-497
|   |
|   Store(hdfs://localhost:48350/tmp/temp649016960/tmp804709981:org.apache.pig.impl.io.InterStorage) - scope-546
|   |
|   |---B: Filter[bag] - scope-507
|       |   |
|       |   Equal To[boolean] - scope-510
|       |   |
|       |   |---Project[int][0] - scope-508
|       |   |
|       |   |---Constant(3) - scope-509
|
|---A: New For Each(false,false,false)[bag] - scope-491
    |   |
    |   Cast[int] - scope-483
    |   |
    |   |---Project[bytearray][0] - scope-482
    |   |
    |   Cast[int] - scope-486
    |   |
    |   |---Project[bytearray][1] - scope-485
    |   |
    |   Cast[int] - scope-489
    |   |
    |   |---Project[bytearray][2] - scope-488
    |
    |---A: Load(hdfs://localhost:48350/user/root/input:org.apache.pig.builtin.PigStorage) - scope-481--------

Spark node scope-540
C: Store(hdfs://localhost:48350/user/root/output:org.apache.pig.builtin.PigStorage) - scope-502
|
|---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-539--------

Spark node scope-542
D: Store(hdfs://localhost:48350/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-533
|
|---D: FRJoin[tuple] - scope-525
    |   |
    |   Project[int][0] - scope-522
    |   |
    |   Project[int][0] - scope-523
    |   |
    |   Project[int][0] - scope-524
    |
    |---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-541--------

Spark node scope-545
Store(hdfs://localhost:48350/tmp/temp649016960/tmp-2036144538:org.apache.pig.impl.io.InterStorage) - scope-547
|
|---A1: New For Each(false,false,false)[bag] - scope-521
    |   |
    |   Cast[int] - scope-513
    |   |
    |   |---Project[bytearray][0] - scope-512
    |   |
    |   Cast[int] - scope-516
    |   |
    |   |---Project[bytearray][1] - scope-515
    |   |
    |   Cast[int] - scope-519
    |   |
    |   |---Project[bytearray][2] - scope-518
    |
    |---A1: Load(hdfs://localhost:48350/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-511--------
{code}

 assertEquals(4, stats.getJobGraph().size());[code|https://github.com/apache/pig/blob/spark/test/org/apache/pig/test/TestPigRunner.java#L459] fails because now there are 5 stores not 4.   But even we modify the value from 4 to 5. This test still fails in  assertEquals(5, inputStats.get(0).getNumberRecords()); [code| https://github.com/apache/pig/blob/spark/test/org/apache/pig/test/TestPigRunner.java#L498].  The number of Records of input file is calculated wrongly in spark mode in multiquery case.
I will fire new jira to record this but for now if multiquery is not enabled like what i did in the PIG-4988.patch, this issue can be avoided.

> Fix unit test failure after PIG-4771's patch was checked in
> -----------------------------------------------------------
>
>                 Key: PIG-4898
>                 URL: https://issues.apache.org/jira/browse/PIG-4898
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> Now in the [lastest jenkins|https://builds.apache.org/job/Pig-spark/#328], it shows that  following unit test cases fail:
>  org.apache.pig.test.TestFRJoin.testDistinctFRJoin	
>  org.apache.pig.test.TestPigRunner.simpleMultiQueryTest3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)