You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/05/18 08:48:12 UTC
[jira] [Commented] (PIG-4898) Fix unit test failure after
PIG-4771's patch was checked in
[ https://issues.apache.org/jira/browse/PIG-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288630#comment-15288630 ]
liyunzhang_intel commented on PIG-4898:
---------------------------------------
the reason why following unit tests fail is
1. org.apache.pig.test.TestFRJoin.testDistinctFRJoin
2. org.apache.pig.test.TestPigRunner.simpleMultiQueryTest3
#1 is because we missing following code thus throw NPE.
SparkCompiler#visitDistinct
{code}
public void visitDistinct(PODistinct op) throws VisitorException {
try {
addToPlan(op);
+ phyToSparkOpMap.put(op, curSparkOp);
} catch (Exception e) {
int errCode = 2034;
....
{code}
2# is because now we don't replace FRJoin with regular join.
TestPigRunner#simpleMultiQueryTest3.pig
{code}
A = load '" + INPUT_FILE + "' as (a0:int, a1:int, a2:int);
A1 = load '" + INPUT_FILE_2 + "' as (a0:int, a1:int, a2:int);
B = filter A by a0 == 3;
C = filter A by a1 <=5;
D = join C by a0, B by a0, A1 by a0 using 'replicated';
store C into '" + OUTPUT_FILE;
store D into '" + OUTPUT_FILE_2
{code}
before when we use regular join to implement it, the spark plan is
{noformat}
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-521
Store(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-522
|
|---A: New For Each(false,false,false)[bag] - scope-478
| |
| Cast[int] - scope-470
| |
| |---Project[bytearray][0] - scope-469
| |
| Cast[int] - scope-473
| |
| |---Project[bytearray][1] - scope-472
| |
| Cast[int] - scope-476
| |
| |---Project[bytearray][2] - scope-475
|
|---A: Load(hdfs://localhost:59787/user/root/input:org.apache.pig.builtin.PigStorage) - scope-468--------
Spark node scope-524
Store(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-525
|
|---C: Filter[bag] - scope-482
| |
| Less Than or Equal[boolean] - scope-485
| |
| |---Project[int][1] - scope-483
| |
| |---Constant(5) - scope-484
|
|---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-523--------
Spark node scope-527
C: Store(hdfs://localhost:59787/user/root/output:org.apache.pig.builtin.PigStorage) - scope-489
|
|---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-526--------
Spark node scope-533
D: Store(hdfs://localhost:59787/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-520
|
|---D: FRJoin[tuple] - scope-512
| |
| Project[int][0] - scope-509
| |
| Project[int][0] - scope-510
| |
| Project[int][0] - scope-511
|
|---B: Filter[bag] - scope-494
| | |
| | Equal To[boolean] - scope-497
| | |
| | |---Project[int][0] - scope-495
| | |
| | |---Constant(3) - scope-496
| |
| |---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp480394712:org.apache.pig.impl.io.InterStorage) - scope-530
|
|---A1: New For Each(false,false,false)[bag] - scope-508
| | |
| | Cast[int] - scope-500
| | |
| | |---Project[bytearray][0] - scope-499
| | |
| | Cast[int] - scope-503
| | |
| | |---Project[bytearray][1] - scope-502
| | |
| | Cast[int] - scope-506
| | |
| | |---Project[bytearray][2] - scope-505
| |
| |---A1: Load(hdfs://localhost:59787/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-498
|
|---Load(hdfs://localhost:59787/tmp/temp1693227580/tmp-2124870865:org.apache.pig.impl.io.InterStorage) - scope-528--------
{noformat}
After PIG-4771
{code}
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-534
Split - scope-548
| |
| Store(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-538
| |
| |---C: Filter[bag] - scope-495
| | |
| | Less Than or Equal[boolean] - scope-498
| | |
| | |---Project[int][1] - scope-496
| | |
| | |---Constant(5) - scope-497
| |
| Store(hdfs://localhost:48350/tmp/temp649016960/tmp804709981:org.apache.pig.impl.io.InterStorage) - scope-546
| |
| |---B: Filter[bag] - scope-507
| | |
| | Equal To[boolean] - scope-510
| | |
| | |---Project[int][0] - scope-508
| | |
| | |---Constant(3) - scope-509
|
|---A: New For Each(false,false,false)[bag] - scope-491
| |
| Cast[int] - scope-483
| |
| |---Project[bytearray][0] - scope-482
| |
| Cast[int] - scope-486
| |
| |---Project[bytearray][1] - scope-485
| |
| Cast[int] - scope-489
| |
| |---Project[bytearray][2] - scope-488
|
|---A: Load(hdfs://localhost:48350/user/root/input:org.apache.pig.builtin.PigStorage) - scope-481--------
Spark node scope-540
C: Store(hdfs://localhost:48350/user/root/output:org.apache.pig.builtin.PigStorage) - scope-502
|
|---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-539--------
Spark node scope-542
D: Store(hdfs://localhost:48350/user/root/output2:org.apache.pig.builtin.PigStorage) - scope-533
|
|---D: FRJoin[tuple] - scope-525
| |
| Project[int][0] - scope-522
| |
| Project[int][0] - scope-523
| |
| Project[int][0] - scope-524
|
|---Load(hdfs://localhost:48350/tmp/temp649016960/tmp48836938:org.apache.pig.impl.io.InterStorage) - scope-541--------
Spark node scope-545
Store(hdfs://localhost:48350/tmp/temp649016960/tmp-2036144538:org.apache.pig.impl.io.InterStorage) - scope-547
|
|---A1: New For Each(false,false,false)[bag] - scope-521
| |
| Cast[int] - scope-513
| |
| |---Project[bytearray][0] - scope-512
| |
| Cast[int] - scope-516
| |
| |---Project[bytearray][1] - scope-515
| |
| Cast[int] - scope-519
| |
| |---Project[bytearray][2] - scope-518
|
|---A1: Load(hdfs://localhost:48350/user/root/input2:org.apache.pig.builtin.PigStorage) - scope-511--------
{code}
assertEquals(4, stats.getJobGraph().size());[code|https://github.com/apache/pig/blob/spark/test/org/apache/pig/test/TestPigRunner.java#L459] fails because now there are 5 stores not 4. But even we modify the value from 4 to 5. This test still fails in assertEquals(5, inputStats.get(0).getNumberRecords()); [code| https://github.com/apache/pig/blob/spark/test/org/apache/pig/test/TestPigRunner.java#L498]. The number of Records of input file is calculated wrongly in spark mode in multiquery case.
I will fire new jira to record this but for now if multiquery is not enabled like what i did in the PIG-4988.patch, this issue can be avoided.
> Fix unit test failure after PIG-4771's patch was checked in
> -----------------------------------------------------------
>
> Key: PIG-4898
> URL: https://issues.apache.org/jira/browse/PIG-4898
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>
> Now in the [lastest jenkins|https://builds.apache.org/job/Pig-spark/#328], it shows that following unit test cases fail:
> org.apache.pig.test.TestFRJoin.testDistinctFRJoin
> org.apache.pig.test.TestPigRunner.simpleMultiQueryTest3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)